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NON-ZERO-SUM  TWO- PERSON  REPEATED  GAMES 
WITH  INCOMPLETE  INFORMATION  * 

by 

Sergiu  Hart## 


Introduction 

An  incomplete  information  environment  is  one  where  at  least 
some  of  the  participants  do  not  possess  all  the  relevant  data.  Much 
interest  has  been  devoted  in  recent  years  to  the  analysis  of  such 
situations.  In  the  economic  theory  literature,  for  example:  the 
principal-agent  problem;  the  theory  of  auctions;  signalling  (e.g.,  in 
insurance  markets);  rational  expectations  equilibria;  and  so  on. 

What  are  the  main  difficulties  in  such  problems?  First,  consider 
the  "informed"  persons — those  who  know  more  than  others.  On  one  hand, 
it  is  to  their  advantage  to  make  use  of  their  additional  information 
(in  order  to  improve  their  own  final  outcome).  On  the  other  hand,  by 
doing  so  they  actually  reveal  this  information — and  their  relative 
advantage  vanishes.  Thus — what  is  the  good  of  being  more  informed,  if 
one  cannot  profit  from  it?  This  type  of  conflict  is  an  essential  issue 
in  the  analysis  of  incomplete  information  environments. 
_ £ _ 
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As  an  idealized  example,  assume  someone  has  "inside  information" 
that  a  certain  small  company  has  just  succeeded  in  developing  a  new 
product,  for  which  a  very  profitable  market  exists.  He  thus  expects 
that  the  value  of  the  shares  of  this  company  in  the  Stock  Exchange 
will  raise  dramatically.  Should  he  immediately  buy  a  large  quantity 
of  these  shares?  By  doing  so,  he  will  implicitly  signal  to  the  others 
the  success  of  the  company — and  everyone  will  want  to  buy  its  shares, 
raising  their  value  immediately  and  lowering  the  profits  of  the  initially 
informed  person.  The  answer  clearly  lies  in  him  buying  the  "right" 
quantity  of  shares— not  too  large  to  draw  attention,  and  not  too  small 
to  make  his  profit  insignificant . 

The  results  of  the  analysis  of  such  models  of  incomplete  infor¬ 
mation  usually  indicate  that  some  transmission  of  information  does  occur 
(possibly,  in  an  implicit  way  only;  namely,  deducing  information  from 
actions  taken  by  those  possessing  it).  Thus,  there  is  need  for  communi¬ 
cations,  and  some  sort  of  cooperation  may  arise  (e.g.,  "trading 
information")- — even  though  everything  is  based  on  purely  selfish 
(non-cooperative)  motives. 

There  is  yet  another  conflict — this  time,  for  the  "uninformed" 
participants.  Should  they  trust  the  information  transmitted  by  the 
informed  ones?  In  the  Stock  Exchange  example— maybe  the  purpose  of 
buying  a  large  quantity  of  shares  is  just  to  convince  everyone  that  a 
technological  breakthrough  indeed  occurred,  leading  to  a  big  buying 
activity,  which  may  finally  make  a  good  profit  for  the  one  that  started 
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it  all.  This,  also  in  case  no  new  product  has  been  at  all  developed  by 
the  company! 

Game  theory  is  a  tool  for  studying  conflict  situations— by 
definition,  inter-personal  conflicts.  However,  one  obtains  as  an  outcome 
resolution  of  intra-personal  conflicts  (like  the  ones  mentioned  above) 
as  well — based  on  individual  rational  behaviour.  This  is  true  in  parti¬ 
cular  for  games  with  incomplete  information — a  class  of  which  forms  the 
subject  of  this  paper. 

An  important  development  in  game  theory  in  recent  years  has  been 
in  the  study  of  multi-stage  games— especially,  the  so  called  repeated 
games,  where  the  same  game  is  played  repeatedly.  This  suggests  itself 
as  a  good  framework  for  incomplete  information  games,  for  two  main 
reasons . 

The  first  one  is  that  by  its  very  nature,  a  repeated  game  has 
enough  structure  to  allow  the  kinds  of  complicated  behaviour  we  described 
above  (and  many  others  as  well).  There  is  enough  "time”  to  enable  players 
to  ’’generate”  certain  beliefs  in  other  people,  or  to  make  deductions, 
statistical  inferences,  and  so  on.  There  is  also  place  for  threats, 
for  punishments — and  for  rewards  too. 

The  second  reason  is  more  formal. — although  closely  related  to 
the  first  one.  Consider  an  infinitely  repeated  game  with  complete 
information.  A  well  known  result  (called  the  "Folk  Theorem"  since  its 
authorship  is  not  clear)  states  that  the  non-cooperative  equilibria  in 


the  repeated  game  precisely  correspond  to  the  individually  rational  and 
jointly  feasible  points  in  the  one-shot  game.  The  importance  of  this  result 
is  that  one  obtains  cooperative  outcomes  in  the  one-shot  game  from 
non-cooperative  behaviour  in  the  infinite  game ,  Thus ,  the  cooperation 
we  usually  observe  is  explained  here  not  as  an  outcome  of  altruistic 
motives — but  of  purely  selfish  non-cooperative  ones  (which  many  feel  are 
the  only  rational  ones) , 

One  is  therefore  led  in  a  natural  way  to  the  study  of  repeated  games 
of  incomplete  information.  The  first  research  on  these  was  done  in  the 
Mathematics  [1966-68]  reports,  in  particular  by  Aumann,  Maschler  and 
Stearns.  It  turned  out  that  the  very  complex  structure  of  these  games — 
which,  as  we  pointed  out  above,  is  one  of  the  reasons  for  studying 
them — creates  many  difficulties ,  Up  to  date,  essentially  only  two-person 
zero- sum  games  have  been  completely  analyzed  (see  the  forthcoming  book 
of  Mertens  and  Zamir  [1980],  or  the  notes  of  Sorin  [1980]  for  details). 

As  for  the  non-zero-sum  case  (still,  only  two  players ) ,  a  first 
study  has  been  done  by  Aumann,  Maschler  and  Stearns  [1968].  They 
characterized  a  special  class  of  equilibria,  in  the  so-called  standard 
one-sided  information  case,  where  one  player  has  more  information 
than  the  other  one ,  and  both  observe  during  the  play  all  the 
actions  taken.  These  equilibria— called  "enforceable  joint  plans"— 
essentially  consist  of  a  transmission  of  information  from  the  informed 
to  the  uninformed  player  ("signalling"),  followed  by  a  completely 
non-revealing  play  from  then  on  (similar  to  the  Folk  Theorem).  Moreover, 
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they  showed  that  this  does  not  exhaust  all  equilibria— one  could  have 
joint  randomizations  of  enforceable  joint  plans,  and  so  on. 

Our  main  result  in  this  paper  is  the  complete  characterization 
of  all  equilibria  in  such  games.  We  will  show  that  every  equilibrium 
is  equivalent  to  a  collection  of  non-revealing  "plans",  one  of  which  is 
chosen  at  random.  This  choice  is  done  via  a  sequence  of  communications, 
which  are  of  two  types:  signalling  (i.e.,  implicit  transmission  of 
information),  and  jointly  controlled  randomizations  (i.e.,  "lotteries" 
in  which  no  one  player  can  unilaterally  change  the  probabilities).^ 

Thus,  we  are  able  to  characterize  in  a  formal  way  all  the  kinds 
of  cooperation  and  communication  that  arise  out  of  non-cooperative 
behaviour  in  these  games;  moreover,  we  obtain  a  precise  structure  that 
guarantees  it  does  not  pay  any  player  to  do  anything  else  (e.g. ,  revealing 
less  or  more,  or  double-crossing,  cheating,  and  so  on).  We  would  like 
to  point  out  that  the  model  is  not  the  most  general  possible  (in  parti¬ 
cular,  in  terms  of  the  information  structure);  this  paper  is  to  be 
regarded  as  a  first  step  in  the  analysis  of  non-zero-sum  repeated  games 
with  incomplete  information. 

The  formal  model  is  described  in  Section  2,  together  with  various 
notions  of  equilibrium.  The  main  results  are  stated  in  Section  3,  which 
also  includes  additional  discussion  and  intuitive  interpretations. 

Sections  U  and  5  are  devoted  to  the  two  parts  of  the  proof,  and  in 
Section  6  we  present  some  results  on  enforceable  Joint  plans.  We  would 
like  to  point  out  that  Sorin  [l98l]  has  recently  proved  the  existence  of 
such  equilibria  whenever  the  number  of  possible  games  is  two. 
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Some  notation  :  R  is  the  real  line,  and  Rn  the  n-dimensional 

Euclidean  space.  For  vectors  x=  (x1,...,xn)  and  y  =  )  in 

Rn,  x  >  y  means  xt  >  for  all  i  =  1,2, ...,n,  and  x  *  y  is  the 
n 

scalar  product  \  x.y  .  For  a  finite  set  L,  |l|  is  the  number  of 
i=l  1 

elements  of  L,  and  R  the  | L| -dimensional  Euclidean  space  with 
coordinates  indexed  by  the  members  of  L  (thus,  we  write 

x  =  'Xg, ^ 2.GL  =  f°r  x  in  R^).  The  unit  simplex  in  R^ 

will  be  denoted  by 

AL  =  {x  e  RL:  x  >  0  for  all  SL  in  L,  [  x  =  1}  . 

4  "  SiEL 

Finally,  N  is  the  set  of  positive  integers  {1,2,...}. 

2.  The  Model 

The  class  of  games  we  study  is  given  by  the  following: 

(i)  Two  players,  player  1  and  player  2. 

(ii)  A  finite  set  I  of  choices  for  player  1  and  a  finite  set 
J  of  choices  for  player  2;  I  and  J  contain  each  at  least  two 
elements  Si 

(iii)  A  finite  set  K  of  games;  to  each  k  in  K  there 
corresponds  a  pair  of  I  x  J  matrices  (A  ,B  ),  with 

Ak  =  (Ak(i,j))i€l  ,  Bk  =  (Bk(i,j))ie[  . 

JeJ 
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* 


N 


(iv)  A  probability  vector  p  =  (p  on  the  set  K 

K  k 

(i.e.,  p  e  A  ;  without  loss  of  generality,  we  assume  p  >  0  for 

all  k  in  K  -  otherwise,  we  may  discard  those  k  that  have  zero 

probability) . 

Based  on  (i)-(iv),  a  game  of  incomplete  information  r^p)  is 
given  as  follows: 

(v)  An  element  k  of  K  is  chosen  according  to  the  probability 
vector  p;  player  1  is  told  tc,  player  2  is  not. 

(vi)  At  each  stage  t  =  1,2,...,  player  1  chooses  an  element  i^ 
in  I  and  player  2  chooses  an  element  in  J ;  the  choices  are  made 

simultaneously  (or,  without  either  player  knowing  what  the  other  did). 

(vii)  Both  players  are  then  told  the  pair  (it,Jt),  and  they  get 
the  payoffs  AK(it,Jt)  and  BK(it,Jt),  respectively  (but  they  do  not 
observe  these  payoffs). 

(viii)  Both  players  have  perfect  recall  (i.e.,  they  do  not  forget 
what  they  were  told  at  all  previous  stages). 

(ix)  All  of  (i)-(viii)  is  common  knowledge  to  both  players  (see 
Aumann  I1976]  for  a  precise  definition). 

Usually,  (v)  is  called  one-sided  information  (see  also  the 
discussion  below),  and  (vii)  and  (viii)  -  standard  information.  Note 
that  the  players  observe  only  the  actual  choices  it  and  Jt,  and  not 
the  randomizations  used. 

Following  Harsanyi  ( 1967-68 1 ,  this  can  be  equivalently  viewed  as  a 
game  with  complete  but  imperfect  information  (namely,  where  the 
uncertainty  players  have  is  not  about  the  "rules  of  the  game"  -  e.g.. 


payoffs  -  but  only  on  moves  previously  made,  by  the  players  or  by 
chance).  This  is  done  by  adding  a  stage  t  =  0,  at  which  "nature" 
chooses  an  element  x  of  K  according  to  the  probability  p.  At  each 
stage  t  =  1,2,...,  the  information  player  2  has  consists  of  the 
sequence  of  previous  choices  by  both  players:  (i^ ,J^) ,  (ig ,Jg) » • • • , 

(tj.  ^).  As  for  player  1,  he  in  addition  knows  <• 

This  completes  the  description  of  r^p).  It  should  be  pointed  out 
that  more  general  games  can  be  made  to  fit  into  this  model.  In 
particular,  consider  the  case  where  player  1  does  not  have  full 
information  on  x,  but  player  2  knows  even  less  (see  Mertens  &  Zamir 
(1980,  Ch.IIl]).  Formally,  a  partition  of  the  set  K  is  given  for  each 
player,  which  is  informed  only  what  element  of  his  partition  contains 
the  chosen  x.  For  example,  let  K  =  {1 ,2,3 ,4 ,5) ,  the  partition  of 
player  1  is  (1 ,2} , {3} , {4 ) , (5),  and  that  of  player  2  is  (1 ,2 ,3 } » (*♦  ,5 ) • 
The  (common)  prior  is  p  =(l/5,  1/5.  l/5»  1/5.  1/5).  First,  we  observe 
that  both  players  distinguish  between  (1,2,3)  and  {4,5}  -  thus,  there 
are  two  completely  disjoint  games.  In  the  first,  both  do  not  distinguish 
between  1  and  2;  therefore,  this  corresponds  to  K'  =  {{1,2}, (3)}  and 
p'  =  (2/3,  1/3),  where  the  payoff  matrix  for  {1,2)  is  A^1’^  = 

(l/2)A^  +  (l/2)A^  and  similarly  for  B.  Note  that  2/3  is  the 
conditional  probability  that  x  G  {1,2}  given  x  G  {1,2,3),  1/3  is 
P(x  =  3 1 x  €  {1,2,3)),  and  1/2  =  P(x  =  l|x  e  {1,2}  )  = 

P(  x  =  l|x  E  {1,2})  =  P(x  =  2|x  G  {1,2}).  In  the  second  latter  game, 

K"  =  {4,5}  and  p"  =  (l/2,  1/2).  Thus,  the  original  game  has  been 


decomposed  into  two  games,  each  fitting  our  model.  It  should  be  clear 


how  to  generalize  the  construction  given  for  this  example. 

Next  we  describe  the  sets  of  strategies  of  the  players  in 
T  (p).  For  each  t  =  1,2,...,  let  H.  be  the  set  of  histories  up  to 
(but  not  including)  stage  t,  namely,-^/ 

Ht  =  (I  x  J)t_1  . 

A  pure  strategy  a  of  player  1  is  a  collection  a  =  (o,. },  "  ,  where 

Xf  t— 1 

(2.1)  ot:  Ht  x  K  ♦  I 

for  all  t  =  1,2,...  .  Thus,  for  every  history  h  in  H  and  every 

w  u 

k  in  K  (the  "true"  game  k  chosen),  o  (h  •  k)  is  the  choice  i 
made  by  player  1.  In  a  similar  way,  a  pure  strategy  t  of  player  2  is 
t  =  (V^,  where 

(2.2)  xt:  Ht  J 
for  &11  t  —  1 y2  $ • • •  • 

A  mixed  strategy  is,  as  usual,  a  probability  distribution  over  the 
set  of  pure  strategies.  Since  r  (p)  is  a  game  with  perfect  recall, 
one  can  restrict  the  study  to  behaviour  strategies  (cf.  Kuhn  (1953]  and 
Aumann  (19641),  where  players  make  independent  randomizations  at  each 
move.  A  behaviour  strategy  is  thus  defined  in  the  same  way  as  a  pure 
strategy,  with  (2.l)  replaced  by 

(2.3)  at:  Ht  x  K  -*•  A1  , 


and  (2.2)  replaced  by 
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(2.U)  Tt:  Ht  ♦  AJ  . 

Since  we  never  use  pure  strategies  specifically,  the  term  "strategy" 

will  henceforth  mean  behaviour  or  mixed  strategy. 

We  have  not  yet  defined  payoffs  in  T  (p)  -  only  sequences  of 

00 

payoffs.  Given  a  pair  of  strategies  (c,t)  of  the  two  players,  we 
denote 

(2.5)  a£  =  |  ^Ak(it,jt)  , 

(2.6)  8T  =  |  J>K(it,jt)  , 

]r 

for  all  T  =  1,2,...  and  all  k  in  K.  Thus,  aT  is  the  average 

payoff  up  to  (and  including)  stage  T  to  player  1,  if  the  -rue  game  is 

<  =  k;  this  depends  on  the  choices  of  i  ’s  and  j  's,  made  according  to 

t  t 

k.  k 

o  and  t  (actually,  only  a(  • :  k)  and  t  matter).  Let  E  (am) 

a,  t  T 

denote  its  expectation.  For  player  2,  0^  is  his  average  payoff  up  to 

T;  it  depends  on  o,t  and  also  on  the  choice  of  <  (according  to 

p).  Let  E  (8_)  be  its  expectation. 
a,t,p  T 

A  pair  (ct,t)  of  strategies  is  a  (Nash)  equilibrium  point  in 

r„(p>  ^ 

(2.7)  lim  inf  (a£)  >_  lim  sup  (ajj) 

T-h»  °’T  1  T-h»  o  ,t  i 

for  all  strategies  o'  of  player  1  and  all  k  in  K,  and 
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(2.8)  lim  inf  E  (flj  >_  lim  sup  E  ,  (gj 

°»l»P  1  rp+rr,  °  **  *P  -1 


for  all  strategies  t'  of  player  2.  If  we  take  a'  =  o  in  (2.7),  we 
get  a  vector  a  =  (ak)kg^  such  that 


(2.9)  lim  e£  Jab  =  ak 


0,T  T 


for  all  k  in  K.  Similarly,  t*  =  t  in  (2.8)  £ives  g  with 


(2.10)  lim  E  (g_)  =  6 

m  0,T,p  T 


We  will  call  a  and  6  the  payoffs  of  the  equilibrium  point 
(o,x). 

Note  that  they  are  computed  ex-post — namely,  after  the  choice  of 
•c  was  made  and  player  1  was  informed  of  it.  Therefore,  player  1 
considers  his  payoffs  in  each  possible  state  ic  =  k,  whereas  for  player 
2  only  his  expectation  over  k  matters.  It  can  be  easily  checked  that 
the  definition  does  not  change  if  we  replace  (2.7)  by  ex-ante 
optimality ,  namely : 

11;-fE«.x,p(Viii;-pE0',T,p(v  • 

1C 

where  is  defined  in  the  same  way  as  g^,  (thus,  =  a^) .  Indeed, 

since  the  value  k  of  ic  is  in  any  case  part  of  the  information  player 
1  has  at  every  stage,  he  can  choose  his  best  response  against 
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t  independently  for  each  k.  In  the  imperfect  information  version  of 

^(p),  the  adequate  payoff  is  indeed  this  expectation  over  k;  in  the 

incomplete  information  one,  the  vector  of  payoffs  should  be  considered 

instead  since,  given  any  "type"  k  -  in  Harsanyi's  terminology  -  it 

does  not  care  about  the  payoffs  to  all  the  other  possible  types! 

A  strengthening  of  the  definition  of  equilibrium  is  suggested  by 

the  results  obtained  in  the  zero  -  sum  case  (i.e.,  where 
k  k 

A  +  B  5  0  for  all  k  in  K) .  A  pair  of  strategies  (0, t)  is  a 
uniform  equilibrium  point  in  I*  (p)  if 

(2.11)  lim  inf  E*  (a£)  >  lim  sup(sup  E^,  (a£)) 

T-x»  ’  T-k»  o'  * 

for  all  k  in  K,  and 

(2.12)  lim  inf  E  (0  )  >  lim  sup  (sup  E  ,  (0T))  . 

T-*»  °»T»P  1  x»  °*T  *P  1 

Clearly,  every  uniform  equilibrium  point  is  also  an  equilibrium  point 
(if  we  change  the  order  of  limsup  and  sup  in  (2.1l)  and  (2.12),  we 
obtain  (2.7)  and  (2.8),  respectively).  The  payoffs  (a,0)  are  given  by 
(2.9)  and  (2.10). 

To  emphasize  the  difference  between  the  two  definitions,  we 
translate  them  into  the  "e  -  language".  A  uniform  equilibrium  satisfies 
the  following:  for  every  c  >  0  there  exists  Tq  =  TQ(e)  large  enough 
such  that  for  all  T  >  T  , 

„k  ,  ks  k 

Eo'  ,t  aT  -  a  +  6  and 


(2.13) 
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Eo,t',p<V  i 6  +  E  • 

for  all  k  in  K  and  all  strategies  o'  of  player  1  and  x'  of  player 
2.  For  a  regular  equilibrium  (according  to  (2.7)  and  (2.8)),  TQ  may 
also  depend  on  o'  and  x' .  The  importance  of  (2.13)  uniformly  in  o'  and 
x'  is  that  it  implies  that  (o,x)  generates  an  e  -  equilibrium  in  all 
long  enough  but  finite  games  T^,(p)  (which  are  defined  in  the  same  way 
as  rjp),  but  they  only  last  T  stages).  Since  r— (p)  may  be  viewed 
as  an  "idealization"  of  such  games,  the  uniform  definition  may  seem  more 
appropriate. 

However,  we  will  prove  the  following  result: 

Proposition  2.14:  The  sets  of  payoffs  of  equilibrium  points  and 
of  uniform  equilibrium  points  in  ^(p),  coincide. 

Thus,  although  it  is  clear  that  there  exist  equilibrium  points 
that  are  not  uniform,  they  are  always  payoff -equivalent  to  uniform  ones. 

Other  definitions  of  equilibrium  are  also  possible.  For  example, 
one  could  use  Abel  instead  of  Cezaro  summability;  namely,  limits  as 
p  >  0  converges  to  0  of 


E(p  l 


t=l  (1+p) 


9 


where  is  the  corresponding  sequence  of  payoffs  (this  is 

interpreted  as  the  limit,  as  the  interest  rate  goes  to  zero,  of  the 

4 

current  value).  Banach  limits  (see  Section  4)  can  also  be  used. 


However,  in  all  cases  the  set  of  equilibrium  payoffs  will  not  change. 
In  view  of  this  result,  we  can  unambigously  define  the  set  of 


equilibrium  payoffs  of  I*  (p).  Our  main  result  will  be  a 
characterization  of  this  set. 

Can  one  further  strengthen  the  definition  of  equilibrium  by- 
changing  the  order  of  limit  and  expectation?  The  answer  is  no  -  as  an 
example  by  J.-F.  Mertens  and  the  author  shows  already  in  the  zero-sum 
case. 

3 •  Statement  and  Interpretation  of  the  Main  Result 

In  this  section  we  state  our  main  result  -  the  characterization  of 
all  equilibria  in  ^(p). 

The  Folk  Theorem  in  the  complete  information  case  states  that  the 

set  of  equilibrium  payoffs  coincides  with  the  set  of  feasible  and 

individually  rational  payoffs.  We  cohsider  first  the  notion  of 

individual  rationality;  it  is  to  be  understood  in  the  sense  of  what  each 

player  cannot  be  prevented  from  obtaining  (i.e.,  the  "minmax").  The 

study  of  the  zero-sum  case  (Aumann  &  Maschler  [1966])  enables  us  to 

characterize  individual  rationality  in  ^(p). 

We  need  some  notation  first.  Let  p  be  a  probability  vector  in 

AK;  let  p  •  A  be  the  matrix  £  pkAk  (i.e.,  whose  (i,j)th  element 
k  k 

is  1  p  A  (i ,J ) ) .  Consider  the  two-person  zero-sum  game  with  payoffs 
kQC 

to  player  1  given  by  p  •  A,  and  let  (val^A)(p)  denote  its  value  (when 
played  Just  once).  Thus, 

(3.1)  (val1A)(p)  =  max  min  (p  •  A)(x,y)  =  min  max  (p  •  A)(x,y)  , 

xSA1  yGAJ  y€AJ  xGA1 
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vhere  x  =  (x^^.  ,  y  =  (y^)^  »  and 

(p»A)(x,y)  =  l  l  x.y  J  pkAk(ij)  . 
ia  J5J  J  k0C 

Similarly,  let  (valgBMp)  be  the  value  to  player  2  of  the  two-person 
zero-sum  game  with  payoff  matrix  p  •  B  to  player  2.  Clearly, 

(3*2)  (valgBXp)  =  -  (val^ (-  B))(p)  . 


For  a  function  f  on  A  ,  let  vex  f  denote  its  convexification; 

K 

namely,  vex  f  is  the  largest  convex  function  on  A  that  does  not 


exceed  f.  We  will  write  (vex  val,,B)(p)  for  the  evaluation  of  the 
function  vex  (val^B)  at  the  point  p. 

1c  K 

We  can  now  define:  a  vector  a  =  (a  )lrf_y  in  R  is  an 
individually  rational  payoff  vector  to  player  1  Jji  f^p)  if 


(3.3)  q  •  a  (val^AXq)  ,  for  all  q  in  AK  . 

A  scalar  B  in  R  is  an  individually  rational  payoff  to  player  2  Jji 

rjp)  if 

(3.U)  8  >  (vex  val2B)(p)  . 


These  definitions  are  the  correct  ones,  in  view  of  the  following 
results.  A  set  Q  in  R1^  is  approachable  by  player  2  (cf.  Blackwell 
(19561 )  if  there  exists  a  strategy  x  of  player  2  such  that 


lim  (sup  Eq  (d(Q,a_) ) )  *  0  , 

T-k»  a  ’ 
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where  a,j,  =  (recall  (2.5)),  d  is  the  Euclidean  distance  in 

R^,  and  the  supremum  is  over  all  strategies  a  of  player  1. 

i r 

Proposition  3*5:  Let  a  be  a  vector  in  R  .  Then  (3.3)  is  a 
necessary  and  sufficient  condition  for  the  set  Q  =  {x  E  R*^  :  x  <_  a}  to 
be  approachable  hy  player  2. 

Proof :  Blackwell  (1956];  for  example,  see  Aumann  &  Maschler 
(1966].  Q.E.D. 

Thus,  if  (3.3)  is  satisfied,  then  player  2  can  guarantee  that  the 

Jr 

payoffs  to  player  1  will  not,  in  the  limit,  exceed  a  for  all  k  in 
K  simultaneously.  If  (3*3)  is  not  satisfied,  then  given  any  strategy 
of  player  2,  player  1  has  a  strategy  such  that,  for  at  least  one  k 
in  K,  he  will  get  more  than  a  . 

For  player  2,  we  have 

Proposition  3-6:  Let  0  be  a  scalar  in  R.  Then  ( 3 -U )  is  a 
necessary  and  sufficient  condition  for  player  1  to  have  a  strategy 
o  such  that 


lim  sup  (sup  E  (0  ))  <  0  , 

T-x*>  t  ’  ,p  1 


where  0^  is  given  by  (2.6)  and  the  supremum  is  over  all  strategies 


t  of  player  2 


Proof :  Aumann  &  Maschler  I1966J ;  see  (3.2).  Q.E.D. 

Again,  this  means  that  player  1  can  hold  player  2  down  to 
(vex  val2B)(p)  in  I*— (p) ,  but  to  no  less  than  that. 

Having  completed  the  study  of  individual  rationality,  we  come  next 
to  feasibility.  Let  us  consider  a  simple  case  first.  Fix  i  in  I 
and  J  in  J :  is  there  an  equilibrium  resulting  in  the  pair  (i,j) 
being  chosen  at  every  stage?  Clearly,  the  answer  depends  on  the  actions 
the  players  will  take  "outside  of  equilibrium"  -  namely,  when  (i,j)  is 
no  longer  been  played.  Again  as  in  the  Folk  Theorem,  it  is  easy  to  see 
that  the  necessary  and  sufficient  condition  is  precisely  individual 
rationality  for  both  players  (each  will  use  the  corresponding  strategy 
given  by  Propositions  3«5  and  3.6,  respectively,  immediately  after  the 

Ip 

other  deviates  from  (l,j)).  Therefore,  the  payoffs  a  =  (A  (i.j)),—, 

and  0  =  l  pVd ,j)  will  be  equilibrium  payoffs  in  r^p)  if  and  only 
kQC 

if  (3*3)  and  (3.1*)  are  satisfied. 

This  reasoning  can  now  be  extended  to  any  convex  combination  by 
using  the  corresponding  frequencies.  It  generates  a  class  of  equilibria 
in  rjp),  which  result  in  player  1  actually  playing  the  same  for  all 
k  in  K  (i.e.,  independent  of  k  ).  Note  that  this  is  true  only  "in 
equilibrium"  (i.e.,  so  long  as  there  are  no  defections);  "out  of 
equilibrium",  the  strategy  given  by  Proposition  3.6  may  depend  on  k.  We 
will  thus  call  these  equilibria  non-revealing . 

To  define  formally  the  corresponding  payoffs,  we  denote  by 


(A,B)(i,j)  the  vector 


-18- 


(A,B)(i,J)  -  (UNi.j))^,  (Bk(i,j))kQC)  G  rK  x  rK  » 

for  all  i  in  I  and  J  in  J.  Then,  let 

(3-7)  F  =  conv  {(A,B)(i,j)  :  161,  J€J}  , 

where  "conv"  denotes  the  convex  hull  of  a  set.  F  can  be  viewed  as  the 
set  of  feasible  vector  payoffs  (in  the  one-shot  game). 

Let  M  be  the  maximum  absolute  value  of  any  possible  payoff: 

(3-8)  M  =  max  {|Ak(i,j)|,  |Bk(i,j)|:  i  G  I,  j  G  J,  k  e  K}  . 


We  then  write  for  the  set  of  all  vectors  in  R*^,  all  of  whose 
coordinates  are  bounded  by  M.  We  also  put  R^  for  the  real  interval 
l-M,M]  (thus  rJJ  =  (Bm)K)  •  Clearly,  F  is  a  subset  of  R^J  x  rJ. 

Finally,  we  define  the  set  G  as  follows:  it  consists  of  all 
triples  (a,0,p),  with  a  in  R^,  0  in  R^  and  p  in  A  ,  such  that 
(3.3)  and  (3.U)  are  satisfied,  and  there  exist  c  and  d  in  R1^  with 


(3.9)  (c,d)€F  , 

(3«10)  a  ^  c  and  p  •  a  =  p  •  c  , 

(3.11)  0  =  p  •  d  . 


As  in  the  zero-sum  case,  we  will  find  it  necessary  to  consider  all 
the  games  r  (p) ,  as  p  ranges  over  A  ,  at  the  same  time;  a  triple 
(a,0,p)  is  understood  as  (a,0)  being  payoffs  in  r  (p). 

In  view  of  our  previous  discussion,  G  is  essentially  the  set  of 


payoffs  corresponding  to  non-revealing  equilibria  (note  that  (3 .10)  can 
be  restated  as:  a^  =  c^  if  p^  >  0,  a^  >_  c^  otherwise  -  therefore, 
a  and  c  are  identical  for  all  relevant  games). 

Our  main  result  states  that,  based  on  the  set  G,  we  can 
characterize  all  equilibrium  payoffs.  We  thus  define  the  concept  of  a 
G-process,  as  follows. 

Let  g  =  (a,B,p)  £  £  k  iL  *  iK.  A  sequence  {g„}  "  =  {(a  ,B  ,p  )} 

mm  n  n=i  n  n  n  n= 

K  K 

of  (R^  x  x  A  )  -  valued  random  variables  (on  some  probability  space) 
is  called  a  G-process  starting  at  g  if: 


(3.12) 


-  g  a.s.  . 


(3*13)  There  exists  a  non-decreasing  sequence  (Z  }  ,  of  finite 

n  n=l 

fieldsii/  with  respect  to  which  (g^^^  is  a  martingale. 


(3.11*)  Let  g^  be  an  a.s.  limit  of  gR  (as  n  ■*•  •);  then 
g  G  G  a.s •  • 


(3.15)  For  each  n  =  1,2,...,  either  an+^  =  aj,  a.s.,  or 
pn+l  =  pn  a,s* 

The  martingale  condition  in  (3.13)  means  that  g  is  Z  -  measurable 

n  n 

and  E( Sn+1 I Zn)  =  gR  a. s.  for  all  n.  Together  with  (3.12),  it 
implies 

E(gR)  =  g  for  all  n.  Since  the  sequence  is  uniformly  bounded,  the 
Martingale  Convergence  Theorem  implies  that  it  has  an  a.s.  limit  -  thus 
(3.1U)  is  well  defined.  It  then  means  that  g  *  (a  ,8  ,p  )  satisfies 

00  OO  GO  00 
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a.s.  individual  rationality  for  both  players  (i.e.,  (3.3)  and  (3.4)), 
and  also  (3*9)  -  (3*ll). 

The  last  condition  ( 3 «15 )  is  slightly  unusual;  at  every  step, 

either  a  or  p  remain  constant  (while  the  other  may  change  -  but  in 
n  n 

such  a  way  that  the  conditional  expectation  does  not,  by  (3*13)).  If  we 
disregard  the  0n  coordinate,  such  a  process  may  be  called  a  bi¬ 
martingale  (see  Proposition  3«l8  below).  The  study  of  such  objects  will 
be  taken  up  in  a  forthcoming  paper  of  R.  J.  Aumann  and  the  author. 

Finally,  we  define  G*  as  the  set  of  all  points  g  =  (a,0,p)  in 
K  K 

******  A  such  that  there  exists  a  G-process  starting  at  g.  We 
note  here  that  (3-12)  and  (3.15)  are  essential  conditions;  without 
either  one,  G*  will  just  be  the  convex  hull  of  G. 

We  are  now  ready  to  state  our  main  result. 

1 r 

Main  Theorem:  Let  a  E  R  and  0  £  R.  Then  (a,0)  are  equilibrium 
payoffs  in  r^p)  if  and  only  if  (a,0,p)  E  G*. 

Thus,  the  set  G*  is  the  graph  of  the  equilibrium  payoffs 
correspondence  (as  p  ranges  over  A  ). 

The  Main  Theorem  and  Proposition  2.l4  will  be  proved  together  (we 
know  of  no  direct  proof  of  the  latter  alone).  This  will  be  done  by 
showing  first  (in  Section  4)  that  all  equilibrium  payoffs,  according  to 
the  regular  definition  (2.7)  -  (2.10),  belong  to  G*.  And  second,  by 
constructing  (in  Section  5)  a  uniform  equilibrium  (cf.  (2.1l)  and 
(2.12))  corresponding  to  any  point  in  G*. 
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The  second  part  of  the  proof  leads  us  to  an  important  additional 
result;  namely,  that  all  equilibrium  points  in  r^p)  are  equivalent  to 
a  special  class  of  equilibria  (those  we  construct  in  Section  5). 
Informally,  such  an  equilibrium  consists  of  a  "master  plan”,  which  is 
followed  by  each  player  so  long  as  the  other  does  it  too;  and  of 
"punishments",  which  come  into  effect  after  a  deviation  from  the  master 
plan  has  been  detected. 

The  master  plan  is  a  sequence  of  "communications"  between  the  two 
players,  the  purpose  of  which  is  to  eventually  settle  on  a  point  in  G 
which  is  played  forever  from  then  on  (using  frequencies),  and  leads  to 
the  desired  "payoffs" .The  communications  are  of  two  sorts: 

"signalling",  where  the  informed  player  1  plays  dependent  on  k  (and 
thus  reveals  some  of  his  information  to  player  2,  who  can  update  his 
posterior  probabilities);  and  joint  decisions,  more  precisely  "jointly 
controlled  lotteries",  where  the  two  players  make  together  a 
randomization  on  how  to  continue  the  play.  Signalling  has  already  been 
obtained  in  the  zero-sum  case;  however,  the  jointly  controlled  lotteries 
(in  which  the  uninformed  player  plays  a  no  lesser  role  than  the  informed 
one)  are  a  feature  of  the  non-zero-sum  case  only. 

At  the  end  of  the  communication  period  (which  we  assume  for  the 
moment  to  consist  of  finitely  many  stages  only),  player  1  will  play 
independent  of  <  (otherwise,  he  will  reveal  additional  information)  - 
and  thus  a  non-revealing  equilibrium  results  from  then  on  (a  point  in 
G).  In  the  general  case,  the  sequence  of  communications  may  be 
infinite.  However,  after  a  long  enough  time,  almost  everything  that  was 
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eve  r  going  to  be  revealed  (by  player  l)  or  decided  (by  a  joint 
randomization)  has  already  occured  -  and  ve  are  essentially  at  a  non¬ 
revealing  point  again  (i.e.,  in  G).  To  generate  the  right  payoffs, 
"payoff  accumulation"  periods  are  then  introduced  between  communications 
-  at  which  both  players  choose  prescribed  moves  (again  -  with  the 
correct  frequencies ) . 

Finally,  punishments  are  always  in  accordance  to  the  strategies 
given  by  Propositions  3»5  and  3.6,  respectively  (see  Proposition  3.16). 

The  structure  of  such  equilibria  is  summarized  in  Figure  1. 

Figure  1 
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The  G-process  is  thus  "followed"  during  the  play.  At  each  stage, 

the  corresponding  gR  =  (an»Bn»Pn)  will  serve  as  a  "state  variable", 

with  aR  in  being  the  vector  payoff  player  1  will  get  from  then 

on,  8  in  R,.  the  same  for  player  2  (averaging  over  k),  and 
n  M 

ir 

Pn  in  A  the  vector  of  posterior  probabilities  for  k. 

Why  is  an  equilibrium  thus  obtained?  Deviations  during 
communications  stages  are  not  helpful:  Jointly  controlled  lotteries  are 
so  designed  as  to  have  each  one  of  the  players  generate  the  right 
probabilities  even  if  the  other  does  not;  as  for  signalling  by  player  1, 
it  occurs  precisely  when  an+^  =  an  in  the  G-process,  which  makes  him 
indifferent  among  the  various  alternatives.  In  all  other  cases,  the 
punishments  keep  the  players  in  line.  This  is  due  to  the  following: 

Proposition  3.16:  Let  ^an»®n»Pn^n-l  be  an  ^  x  AK)  - 

valued  martingale,  converging  a.s.  to  (a^jB^.p^) .  Then: 

(i)  a  satisfies  (3*3)  a.s.  if  and  only  if  an  satisfy  (3*3) 

00  II 

a.s.  for  all5/  n  =  1,2,...  . 

(ii)  satisfies^/  (3«^)  a.s.  if  and  only  if 
(8n,Pn)  satisfy  (3.4)  a.s.  for  all  n  =  1,2,...  . 

Proof:  The  "if"  part  is  obtained  by  taking  the  limit  as 

n  ♦  *  (in  (ii),  we  use  the  continuity  of  the  function 

vex  val^B  -  e«g*»  see  Mertens  A  Zamir  (1980,  Theorem  3>lM). 

00 

Let  {Zn>n_,  be  the  corresponding  sequence  of  a  -  fields,  then 
we  have  aR  =  E(a—|Z  )  by  the  martingale  theorem.  The  "only  if"  part 
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in  (i)  is  obtained  by  taking  conditional  expectations  over  Z^.  As  for 
(ii). 


en  =  E^oolZn)  -  E((vex  val2B)(P»)lZn) 


_>  (vex  val2B)(E(po#|Zn))  =  (vex  val2B)(pn)  , 


where  we  used  the  convexity  of  the  function  vex  val2B  .  Q.E.D. 

This  last  result  leads  to  an  additional  interpretation  of  G*  as 
outcomes  of  bargaining  processes  -  see  Aumann  [1981]  . 

Corollary  3.17:  Let  (a,B)  be  equilibrium  payoffs  in 
T  (p).  Then  a  and  6  are  individually  rational  for  player  1  and  player 

09 

2,  respectively. 

Proof :  Proposition  3.16  for  n  =  1  (recall  (3.12)).  Q.E.D. 

Another  property  of  a  G-process  (which  led  to  the  name  "bi¬ 
martingale")  is  as  follows. 

00  K  K 

Propos it ion  3 » 18 :  Let  { (an*Pn) >n— 1  an  x  a  )  ~  valued 

martingale  with  respect  to  a  non-decreasing  sequence  of  a  -  fields 

{Z  }  ",  •  If  (3»15)  is  satisfied,  then  {a  •  p  }  "  is  also  a 
n  n=l  n  n  n=i 

00 

martingale  with  respect  to  {Z  }  ,  . 

n  n=l 

Proof:  Let  n  be  such  that  a  ,  =  a  a.s.i  then 
-  n+1  n 

E(a  «  r)  ,  1 7  )  =  E(a  •  p  I Z  )  =  a  •  E(p  „  I  Z  )  =  a  •  p  , 
v  n+1  *n+l 1 ln'  n  yn+l'  n'  n  '*n+l'  n'  n  vn  ’ 

since  a  is  Z  -  measurable.  The  same  when  p  ,  =  p  . 
n  n  n+1  n 


Q.E.D. 
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4.  From  Equilibrium  to  Martingale 

This  section  contains  the  proof  of  the  first  part  of  our  result; 
namely,  given  an  equilibrium  point  we  construct  the  corresponding 
G-process  (see  Proposition  4,43  at  the  end  of  the  section  for  a  precise 
statement ) . 

We  start  with  an  informal  discussion  of  the  proof.  Let 

(cr  ,t  )  be  an  equilibrium  point;  to  simplify  the  arguments,  let 

us  assume  that  the  frequencies  with  which  the  various  pairs  (i,j)  in 

k  k 

I  x  J  are  played,  always  converge.  Let  c  -  (c  )  and  d  =  (d  ) 

“  00  k£K  “  k£K 

be  the  limit  payoffs,  then  clearly  (c  ,d  )  €=  F.  For  every  history 

CO  00 

h  up  to  stage  t,  we  then  define  the  following:  for  each  k  in  K, 

"0 

k  k 

a  (h^)  is  the  expected  payoff  to  player  1  if  <  =  k  (thus,  a  (h^)  is 

just  the  expectation  of  c^  given  h^ ) ;  ^(h^)  is  the  expected  payoff 

ic  k 

to  player  2  (the  expectation  of  d  given  h, );  and  p  (h,  )  is  the 

00  X  X 

(posterior)  probability  that  k  =  k  (again,  given  h,  ).  We  next 
introduce  "half-steps",  i.e.,  we  define  the  above  conditional  expecta¬ 
tions  when  given  both  h  and  the  next  move  i  of  player  1;  we  will 

X  X 

thus  write  a  (h  ,i, ),  and  so  on. 

X  X 

Assume  h  has  positive  probability  of  occurring  when  k  =  k. 

X 

Then  all  possible  moves  i  of  player  1  (i.e.,  those  with 

X 

o(h  ;  k)(i, )  >  0)  must  have  the  same  expected  payoff  a  (h  ,i. ).  Other- 

X  X  XX 

wise,  player  1  could  give  probability  1  to  that  i  leading  to  the 
highest  payoff;  this  would  be  "undetected"  by  player  2  (since  this  i^ 
is  possible  according  to  o  ) ,  thus  giving  an  expected  payoff 
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of  a  (ht,it),  which  is  higher  than  that  given  by  a.  This  contradicts 

the  equilibrium  conditions,  therefore  a  (h  ,i  )  must  be  constant — 

X  X 

hence,  equal  to  a  (h^)  — for  all  possible  i^'s.  A  similar  argument 

shows  that  for  the  other  i  's  an  inequality  is  obtained  (if  they  are 

not  chosen,  then  their  corresponding  payoff  cannot  be  higher);  this  will 

eventually  lead  to  the  condition  (3.10). 

Next,  consider  the  half-step  from  (h^,i^.)  to  h  =  (h^,^,^) 

Since  player  2  does  not  know  k  ,  j  is  independent  of  it,  and  the 

posterior  probabilities  cannot  change.  We  thus  have  p  (h  ,i  )  = 

X  X 

It  is  easy  to  check  that  in  all  other  cases,  the  martingale 
conditions  are  satisfied;  e.g.,  E(a  (h^.+1)  |  h^  ,1^)  =  a  (h^,^),  and 
so  on.  We  have  therefore  obtained  a  martingale  (with  the  index  set 
being  that  of  half-steps),  which  furthermore  satisfies  (3.15).  The 
individual  rationality  conditions  (3.3)  and  (3.^)  also  hold 
(since  otherwise  (o,t)  will  not  be  an  equilibrium),  and  one  can  show 
that,  in  the  limit  (which  exists  by  the  martingale  convergence  theorem), 
a  point  in  G  is  a.s.  reached. 

The  actual  proof  will  be  quite  complicated.  Since  we  have  no 
convergence  of  the  payoffs,  we  will  need  to  use  Banach  limits.  To 
facilitate  following  the  arguments,  we  divided  the  proof  into  a  sequence 


of  subsections. 
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.  1  The  Probability  Space 

For  each  t  G  N  (the  set  of  positive  integers),  we  defined 
Ht  =  (I  x  j)'*"'1  , 

the  set  of  histories  before  stage  t.  We  also  define  the  set  of 
infinite  histories 

00 

h.  =  n  (i  x  j)  , 
t=i 

an  element  of  being  a  sequence  i  ( i^  of  moves  made  by 

the  two  players  at  all  stages. 

On  H  we  define  for  each  t  G  N  the  finite  field  generated 
00 

by  H  ,  and  call  it  H  ;  thus ,  two  infinite  histories  belong  to  the 
b  Xi 

same  atom  in  if  and  only  if  they  coincide  up  to  (but  not  including) 

t.  Let  Hm  be  the  a-field  generated  by  all  the  's  (usually  called 
the  cylindrical  or  the  product  c-field  on  the  space  H^). 

The  basic  probability  space  will  also  include  the  choice  of 
k  in  K  by  chance.  Thus,  let  !1  *  H#  x  K  be  endowed  with  the 

v* 

a-field  H  0  2.  Each  pair  of  strategies  (o,x)  and  each  probability 
00 

K 

vector  p  G  A  for  the  initial  chance  move  determine  a  probability 
distribution  on  this  space.  We  denote  it  by  P  ;  note  that  E 

U  9p  V  9  1 

used  in  Section  2  is  precisely  the  expectation  with  respect  to  T  p * 

and  E^  is  the  conditional  expectation  given  k  =  k. 
a,r 
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We  will  use  some  additional  fields  on  H  .  For  each  t  £  N, 

00 

let 


Ht+J  =  (I  x  J)1"1  x  I  -  H  x  I  , 

and  denote  by  the  finite  field  it  generates.  We  have  now 

defined  Hg  and  Hg  for  all  half-integers  s,  namely  all— ^ 
s  e  Ng  =  {1,  lh,  2,  Note  that  {Hg} is  an  increasing 

sequence  of  finite  subfields  of  H  ,  converging  to  H  as  s  -v  «>. 

CO  oo 

Since  our  probability  space  is  actually  Q  =  H  x  K  and  not 

00 

H  ,  we  will  denote  the  field  generated  by  H  on  fi  also  by  H  ;  this 

oo  S  S 

will  generate  no  confusion. 

4.2  Banach  Limit 

In  order  to  deal  with  the  non-summability  of  the  sequences  of 
payoffs,  we  introduce  the  concept  of  a  "Banach  limit"  (e.g.,  see  Dunford 
and  Schwartz  [1958],  p.  73). 

As  usual,  let  i.  be  the  (Banach)  space  of  all  real  bounded 

sequences  x  =  {x  .  .  A  Banach  limit  is  a  real  operator  L  on  2. 

n  n=l  -  ” 

O  . 

with  the  following  properties— '  (holding  for  all  x  =  {x^^  and 
y  =  (y  }  in  JL  ,  and  X,  y  in  R): 

n  n  ro 

L({x  •  xn  +  p  •  yn>)  =  x  •  L({xn>)  +  p  •  L({yn>)  , 


(4.1) 
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^•2>  L({xn+lCl)  *  . 

(4.3)  lim  inf  x  <  L({x  ))  <  lim  sup  x 

n*»  n  n  it*”  n 

In  particular,  note  that  ( U .3 )  implies; 

(4.4)  L({x  ))  =  lim  x  ,  if  (x  )  is  a  convergent  sequence  . 

ir*"  n  n 

Therefore  the  Banach  limit  is  an  extension  of  the  notion  of  limit 

(to  all  bounded  sequences).  To  slightly  simplify  the  notation,  we  will 

00 

henceforth  write  L[x  ]  for  L((x  3  ). 

n  n  n=l 

Three  further  properties  of  Banach  limits  will  be  needed. 

Lemma  4.5:  Let  L  be  a  Banach  limit,  and  let  {x  }  , 

-  n  n 

{y  }  €  SL  .  Then 

n  n  <*> 

|L[xJ  -  L[y  ]  I  <  lim  sup  ]x  -  y  | 

1  n  n  1  —  ’  n  n1 

n-K° 

Proof:  Immediate  by  (4.l)  and  (4.3).  Q.E.D. 

Lemma  4.6:  Let  L  be  a  Banach  limit,  and  X  =  {X  }*  _  an 
-  n  n=l 

%  -valued  random  variable  (i.e.,  X  is  a  measurable  function  from 
00 

some  probability  space  into  %m) .  If  X  has  only  finitely  many  values, 

L[E(X  ) ]  =  E(L[X  3)  . 

n  n 


then 
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Proof:  Immediate  by  (U.l). 


Q.E.D, 


In  particular,  this  result  will  be  useful  for  conditional  expec¬ 
tations  over  finite  fields.  One  could  actually  define  a  stronger  con¬ 
cept  of  Banach  limit,  which  commutes  with  the  expectation  operator  for 
any  uniformly  bounded  (or,  even  uniformly  integrable)  sequence  of  random 
variables — without  the  finiteness  assumption.  The  construction  of  such 
a  so-called  "medial  limit"  requires  however  the  use  of  the  continuum 
hypothesis — and  it  is  not  needed  in  our  proof  (cf.  Mokobodzki,  see 
Meyer  [1973]). 


Lemma  b.J :  Let  L  be  a  Banach  limit ,  and  C  a  compact  and 
convex  subset  of  some  Euclidean  space  Rm.  Let  {x  }”=1  be  a  sequence 

in  C,  with  xn  =  Let  n(r)  =*  L[C^r)  ]  for 

r  =  1,2,. . . ,m.  Then  y  =  (n ( l) ,n (2) , • . . ,n )  G  C. 

Proof:  Let  q  be  any  vector  in  Rm,  then  by  (U.l),  (1+.3) 

and  x  E  C, 
n 


q  •  y  =  L[q  •  x  ]  <  lim  sup  q  ♦  x  <  sup  (q  •  c:  c  E  C) 

rrxx> 


This  holds  for  all  q;  since  C  is  a  compact  convex  set,  it  implies 


y  E  C. 


Q.E.D. 


Given  a  Banach  limit  L,  we  can  now  define  the  concept  of  an 


L -equilibrium  point  in  f  (p),  by  replacing  (2.7)  with 
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(U.8)  LlsMaJl]  >  L[^tjT(aJ)]  , 

and  (2.8)  with 

(ll-9>  L^,T,pV]iL[Ea,T^<6T)]  * 

where  the  limit  L  is  taken  with  respect  to  the  index  T  =  1,2,...; 
this  convention  will  be  kept  throughout  this  section.  The  corresponding 

payoffs  will  then  be 

(4.10)  L[E^T(aJ)]  =  ak 

for  each  k  in  K,  and 
(U.ll)  LlEcr#x  ,p(6t)  1  “  6  * 

We  put  a  =  (&k)j0gf 

In  view  of  (4.3),  every  equilibrium  point  is  also  an 

L-equilibrium  point  for  any  Banach  limit  L. 

Throughout  this  section,  we  fix  the  following:  a  Banach  limit 

L,  a  probability  vector  p  in  A  ,  and  an  L-equilibrium  point  (°,T) 

K 

in  ^(p)  with  payoffs  (a, 3)  E  *  R^,  Unless  stated  otherwise,  the 

probability  measure  P  =  P  is  assumed  (on  the  space  ft),  with 

o,T,p 

E  =  E  the  corresponding  expectation  operator.  Thus,  all  statements 

<7,T,p 

"a.s.",  "martingale",  and  so  on,  will  be  with  respect  to  P.  Also,  we 
will  use  E^  for  the  conditional  expectation  E(*|k  =  k). 
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Our  purpose  is  to  construct  a  G-process  starting  at  (a,$,p). 

The  probability  space  on  which  it  will  be  defined  is  ft,  and  the 

sequence  of  fields  is  (H  }  . 

s  sEN^ 


U.3  The  Martingale  {p^} 

For  each  k  G  K,  s  G  N  and  an  history  h  6  H  ,  let  p^  = 

^  s  s  s 

Pg(hs)  be  the  conditional  probability  of  the  "true"  game  <  of  being 
k,  given  a,  t,  p  and  hg  (namely,  if  s  =  t  E  N,  the  first  t  -  1 
moves  of  each  player;  if  s  =  t  +  -s ,  tGN,  the  first  t  moves  of 
player  1  and  t  -  1  moves  of  player  2)*  We  can  thus  write 

»S  -  P«,T,p<>=  -  *  P<*l(y 


(on  each  atom  h  G  H  of  H  ,  p^  is  a.s.  constant,  thus  a.s.  equal 
s  s  s  s 

to  Pg(hs)).  We  put  Ps  =  (p^)kQc. 


Proposition  4.12;  The  sequence 

martingale  with  respect  to  (H  }  , 

s  sGN2’ 


K 

{Ps}sejJ  is  a  A  -valued 
satisfying: 


(4.13)  P1  =  p  . 

(U.lU)  pt+^  =  p  for  all  tGN  . 
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( 1+ .  15 )  There  exists  a  A  -valued  random  variable  p^  such  that 

p  -*■  p  a.s.  as  s  -+■  «. 
rs  oo 

Proof:  The  fact  that  {p_)_  forms  a  martingale  is  immediate 

from  its  definition.  Since  it  is  bounded,  it  must  converge  a.s.,  say 
k  k 

to  p  ;  then  p  *  (p  ),_,.  (**.lM  follovs  from  the  fact  that  given 

ht+3^  (actually,  only  h^  suffices),  the  t-th  move  of  player  2 

is  independent  of  k;  as  for  ( U . 13 ) . — at  t  =  1  there  is  no  history 
yet,  hence  posteriors  and  priors  coincide.  Q.E.D. 

J+.U  The  Martingales  (y  )  and  {6  } 

s  s 

In  Section  2,  we  defined  the  average  payoffs  of  the  two  players 
up  to  time  T  (see  (2.5)  and  (2.6)).  We  will  find  it  useful  to  define 
also 


(U.16)  «n 


=  i  I  AK(i  ,j  ) 
1  t-i  t 


K 

(i.e.,  aT  =  aT).  For  each  s  €  N^,  let 
Ys  =  L[E(aT|ffa)]  , 

«8  =  L[E(8T|tfB)]  . 
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Thus,  y  and  <5  are  the  (Banach)  limits  of  the  expected  average 
s  s 

payoffs  to  player  1  and  player  2,  respectively,  given  a  history  h  . 


Proposition  4.17:  The  sequences  {ys)^^  and.  are 

R^-valued  martingales  with  respect  to  *  satisfying: 


(4.18)  -  P  *  a  and  <5.^  =  (5 

(4.19)  There  exist  Rw- valued  random  variables  y  and  6  such 

M  00  oo 

that  y  -*■  y  and  5  ->-6  a.s.  as  s 

Proof:  We  can  use  Lemma  4,6 — the  field  H_  being  finite,  y_ 

"  s  s 

has  finitely  many  values: 

E^sW'V  =  E(L[E(cct|Hs^)]|Hs) 

=  L[E(E(aT|Hs^)|Hs)] 

=  L[E(aTiHs)]  =  ys  • 

Thus  (y  forms  a  martingale.  It  is  bounded  by  M  (which  bounds 

s  sGNg 

all  possible  payoffs  by  (3.8),  hence  also  averages,  expectations  and 
limits — by  (4.3) — — of  those).  Therefore  it  converges  to  some  limit  y  • 
For  s  =  1 ,  we  have 


Efa^fy  =  E(oT)  =  l  pVfaJ)  , 
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hence  (4.1)  and  (4.10)  give  y  =  p  •  a,  The  s 
with  in  a  similar  way  (<5^  =  g  is  just  (4.11)). 


•  a,  The  sequence  {{s}  is  dealt 


q.e.d. 


4.5  The  Martingales  (c  )  and  (d  ) 

S  '  s 

We  now  associate  vector  payoffs  to  each  infinite  history.  We 
define,  for  each  k  in  K  and  T  in  N, 


bT=  ?  I  * 


in  a  similar  way  to  the  definition  (2.5)  of  a*.  Kote  that  these  are 

random  variables,  H  -measurable  (k  is  fixed;  in  contrast,  a  and 

T+l  1 

f?T  in  (4.l6)  and  (2.6)  are  (HT+1  ®  2  ) -measurable ) .  We  further  remark 

k  k 

that  a^,  and  bT  are  defined  for  all  histories— even  those  which  may 

be  incompatible  with  <  =  k  according  to  (o,t). 

k 

If  the  limit  of  aT  (as  T  -*■  «)  would  always  exist, 
k  k 

it  would  imply  E(lim  a^)  =  lim  E(a^,).  However,  this  is  not  the  case, 
and  the  Banach  limit  L  commutes  with  the  expectation  operator  if 
there  are  only  finitely  many  values  (see  Lemma  4.6  and  the  discussion 
thereafter).  We  define,  for  each  s  €  N^i 


-  UE(aJ|Hs)] 


d*  =  L[E(b£|«s)] 
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(again,  for  each  k  in  K).  Note  that  the  expectations  are  not  condi¬ 
tional  on  k  =  k;  thus,  the  probability  of  any  history  is  its  total 

probability,  summed  over  all  k  in  K. 

k  k 

One  can  interpret  c  and  d  as  follows.  Let  h  E  H  have 

s  s  s  s 

positive  probability,  then  pg  =  pg(hs)  is  the  vector  of  posterior 

probabilities  for  the  various  games  k.  Assume  that  after  hs  occured, 

player  1  replaces  his  strategy  a  by  his  average  non-revealing  strategy 

there;  namely,  for  all  k  in  K,  he  uses  T  p^o  (h  ; k)  instead 

kGK  S  1 

of  a  (h  ;  k)  whenever  t  >  s  and  h  coincides  with  h  up  to  s. 

w  w  —  T»  S 

The  expected  average  payoffs  up  to  T  in  game  k  will  then  be 
It 

E(a  |h  )  and  E(b  |  h  ),  respectively.  As  we  shall  see  later,  the 

i.  S  -L  S 

difference  in  payoffs  due  to  this  change  in  strategy  becomes  negligible 
as  s  -*•  «>  (Proposition  4.23).  Intuitively,  this  is  due  to  the  fact 
that  after  sufficiently  many  stages,  player  1  has  already  revealed 
(almost)  everything  he  is  ever  going  to  reveal  about  the  true  game 
therefore,  he  must  thereafter  play  (almost)  non-revealing,  or  (almost) 
independent  of  k .  In  technical  terms,  this  occurs  whenever  the  martin¬ 
gales  are  close  to  their  limits. 

k  k 

As  usual,  we  write  c  for  (c  ),„.  and  d  for  (d  ),___.  The 

s  skEK  s  s  kSK 

set  F  was  defined  in  (3.7)  as  the  set  of  all  "feasible"  vector  payoffs 
to  both  players  (in  the  one-shot  game). 

Proposition  4.20:  The  sequences  and  ^s^sEN 

K  ^  ' 

^-valued  martingales  with  respect  to  »  satisfying: 


are 
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( U .  21 )  There  exist  R,,-valued  random  variables  c  and  d  such 

|(J  00  00 

that  c  c  and  d->d  a.s.  as  s  -+-  » 
s  co  s  m 

(U.22)  (c  ,d  )  E  F  a.s,  . 

CO  00 

Proof;  The  martingale  property  and  (U,2l)  are  proved  in  a 
similar  way  to  Proposition  1*.17.  For  every  T  in  N,  the  vector 
^&T^kQC*  ^T^kEK^  ^el°ngs  to  the  compact  convex  set  F,  as  an  average 
of  such  vectors.  The  same  holds  for  its  expectations,  and  by  Lemma  1+.7 
for  its  Banach  limits  (cg,dg)  too.  (1*,22)  now  follows  by  letting 
s  -*•  Q.E.D. 


The  next  proposition  makes  precise  the  statement  that,  as  s  -+■  <», 

player  1  plays  "almost"  non-revealing  after  s  (see  the  discussion 

k  k\ 

following  the  definition  of  c  and  d  ) . 

s  s 


Proposition  h,23i 


Y 


s 


c  0 
s 


6 


s 


-  Pc 


d 

s 


0 


a.s.  as  s  -*■  ». 

Proof:  We  prove  here  the  first  part.  Fix  s  6  Ng,  and  let 
t  >  s,  t  e  N.  Conditioning  over  and  k  gives  (recall  that 

pt+l  =  ktMt+l^  : 


E(AK(i.,j.  )|H  )  =  E(  l  p*  A  (it,Jt)|Hs) 


8 


=  l  P^(Ak(i.,jt)|H  ) 


l  E((pJ+1  -  pk)Ak(it,jt)|Hs) 


We  sum  this  for  all  t  in  the  range  s  <  t  <  T,  and  note  that  total 
payoffs  up  to  s  are  bounded  by  sM,  to  obtain 


E(aTlHs)  -  l  PsE{a^f/s)! 

IflEK 


l  lE(|pk+1 
r  1  ta  kff 
s<t<T 


We  denote  (for  each  s  £  Ng) 


-  w 


L  sup  | p 
kQC  t€N 
t>s 


and  let  T  00 .  By  Lemma  4.5  and  (4,l), 


|  y  -p  •  c  |  <  M  *  E(tt  |H  ) 
s  s  s  s  s 


Since  (p  )  converges  a.s.  as  s  -*■  "  by  (4.15),  it  follows  that 
s  s 

E(irslH  )  -  0  a.s.  as  s  00 ;  this  assertion  is  proved  in  the  next 


lemma. 


Q .E ,D , 


be  a  bounded  sequence  of  real  random 


Lemma  k.2k:  Let  {X  }“  , 

-  n  n=l 

variables,  converging  a,s.  as  n  and  let  {F  l**  .  be  a  non- 

n  n=l 

decreasing  sequence  of  a-fields.  Define 


Y 

n 


sup 

m>n 


» 


then 


E(Y  IF  )  -  0 
n  n 

a.s.  as  n  -*■  °». 


Proof:  Let  =  lim  Xq,  and  put  Z  =  sup  |x  —  | .  Then 

oo  m>n 

is  a  non-increasing  sequence  as  n  -*•  converging  a.s.  to 

zero.  Therefore  ^E^nl^n^n=i  *s  a  bounded  super-martingale^  with 

OO 

respect  to  {Fn/n_^,  hence  converges  a.s.  to  some  Z^.  Now 

E(E{Z  |F  ))  =  E(Z  )  -*■  0,  thus  E{Z  )  =  0  and  Z  =  0  a.s.  Noting 
n  n  n  00 

that  Yq  <  2Z^  completes  the  proof.  Q.E.D. 

Finally  we  have 

Corollary  1+.25: 

-  P.  '  c=o  a-s- 

"  P.  ’  d»  a-s-  • 

Proof:  (U.15),  (^,19),  (*»,2l)  and  Proposition  k.23.  Q.E.D. 


4.6  The  Martingales  {e  }  and  {f  } 

S  1  s 

For  each  k  in  K  and  s  in  Ng ,  we  define 

es  =  SUP  L[Ea',T(aTlHs)]  » 
o'  * 

where  o'  ranges  over  all  strategies  of  player  1  (note  that  the 

expectation  now  is  conditional  on  k  =  k).  Thus,  for  every  history 

h  6  H  ,  e  is  the  most  player  1  can  obtain  if  the  true  game  is  k 
s  s  s 

and  player  2  uses  t  — given  that  h  has  already  occurred. 

s 

Proposition  b.26:  For  every  k  in  K,  s  in  Ng  and  t  in 
N: 

(4.27)  ek  2  ak  . 

(4.28)  ek  >  ck  . 

s  —  s 

U.29)  =  E(eJrt|Ht^)  . 

(4.30)  ek(h.)  =  max  ek.-,  (h, ,i  )  for  all  h.  in  H 

%  t  ±  ej  t  t  t  t 

"t 

Proof;  (4.27)  is  just  (4.8)  and  (4.10).  To  obtain  (4.28),  we 

consider  the  following  o';  if  k  =  k  and  h  occurred,  play  the 

s 


average  non-revealing  strategy  given  by  c;  namely 


-kl- 


r 


[; 

r 

N 


B 


o  '(h  ;  k)  =  ][  p^'a.(h  ;  k')  for  all  t  >  s  and  h  in  H  that 

^  ^  k'QC  s 

coincide  with  h  up  to  s  (see  the  discussion  following  the  definition 
s 

of  c  and  d  in  subsection  ^.5). 
s  s 

To  prove  (4.29),  note  that  the  additional  information  from 

t  +  %  to  t  +  1  is  j  ~v  whose  distribution  depends  on  t  and  h 

t  t 

only,  hence  is  the  same  in  E^,  T  as  in  E,  Therefore 


sup  L[E^,>T(aJ|htt%n 

*  sup 

*  sup  E(aE^,jT(<^ihttl)]|ht+!s) 

(we  used  Lemma  4.6).  Given  \+ig  the  first  stage  player  1  has  to 
choose  a  move  is  t  +  1,  and  by  that  time  he  will  already  know  1  . 

X 


Thus,  the  best  he  can  do  given  h 


't-* h 


is  just  to  do  his  best  given 


(ht^,jt)  =  ht+1,  for  each  possible  J^.  Therefore,  the  last  expression 
is 


=  E(sup  L[iJ,T(a^|ht+1)]|ht^)  , 


proving  (4.29). 


Next,  let  h  G  H  be  given.  For  any  o',  its  relevant  part 

for  L[Ek,  (a5[h+  )]  consists  of  a  probability  distribution 
O  »T  1  x 

j 

ir  =  o^(h^;  k)  in  A  for  choosing  i^,  and  some  strategy  afterwards 


I 


Therefore 


0"  =  o"(i  )  =  o'((h  ,i  ,*);  k) ,  for  each  possible  i  . 

T*  O  T>  X 

(again  using  Lemma  it.6) 


sup  sup  E'k(L[E'k(ak|h  ,i  )]|h  )  , 


k  k 

where  E'  is  just  E  ,  .  The  choice  of  o"( i  )  can  be  done 

a  ,t  t 

separately  for  each  i  ,  therefore  we  can  interchange  the  first  E 
with  the  supremum  over  o"(it),  to  obtain 


rk 


et(ht}  =  supT  l  1r(it)etwht’it)  • 

ttgA  1tG 


The  supremum  is  attained  by  giving  positive  probability  tt ( i  )  only 

0 

to  those  i  for  which  e  ^(h  ,i  )  is  maximal;  this  proves  ( U . 30 ) 
X  X  *  ^2  X  X 

Q.E.D 


It  is  easy  to  see  that  (U.29)  and  (U.30)  imply  that 
)kQ>  forms  a  super-martingale.  To  obtain  a  martingale,  we 

define 


_k  k  v  /  k  k  > 

f  =  e  +  )(e  -  e  .  ) 

s  s  l  r  r+^g 

r(=N 


r<s 


for  all  k  in  K  and  s  in  N  ,  and  put  f  = 


(fk) 


Pro-position  1+ .  31  e  The  sequence  {fgJggi  is  an  Revalued 


martingale  with  respect  to  {H  *  satisfying: 

S  Sfci'g 


(U.32)  f q  =  a  . 

(U. 33)  ft  =  ft+?$  for  all  t  in  N  . 

(U.3U)  There  exists  an  R,  -  valued  random  variable  f  such  that 

f  -+  f  a.s.  as  s  -*•  <*> 
s  <*> 

( 1+ . 35 )  fs  >  es  >  cg  for  all  s  in 

(1+.36)  f  >  c  and  p  •  f  =  p  •  c  a.s. 

OO  —  CO  X00  QO  00  OO 

Proof:  (U.32)  is  immediate  from  (U.27)  and  the  definition 
of  f^.  Let  t  €  N,  then 


f*  -  ek  .  .  I  (ek 

**  t+1=  ,4b  r 

r<t 


k  \  /  k 

er+^  *  <et 


k  v 
et+?5 


-  < + 1} 


e  - 


i€N 

ret 


k  \ 

erW 


fk 

rt  ’ 


proving  ( U . 33 ) .  Moreover , 
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since  the  sum  of  the  additional  terms  is  identical  in  both  and 

.  Using  ( 4.29 )  completes  the  proof  that  (f  }  indeed  forms  a 

martingale  (note:  with  respect  to  the  probability  measure  P  =  Pp  T  ). 

Since  it  is  bounded,  (4.34)  follows. 

k  k 

By  (4.30),  fjj  >  es  (all  the  additional  terms  are  non-negative), 

hence  f ^  >  c^  and  >  c^  a.s.  (recall  (4.28)  and  (4.21)). 

Therefore  p  •  f  >  p  •  c  =  y  (by  Corollary  4.25).  To 

obtain  the  opposite  inequality,  we  note  that  (p  }  and  (f  }  form  a 

s  s 

bi-martingale,  hence  (p  •  f  }  is  a  martingale  (Proposition  3.18),  and 

s  s 

we  have  by  (4.13),  (4.32)  and  (4.18) 


E(p  •  f  )  =  E(p  •  f  )  =  p  .  a  =  E(y  )  =  E(y  ) 

00  00  11  1  CO 


proving  that  p  •  f  =  y  »  p  «c  a.s. 


Q.E.D. 


4.7  Individual  Rationality 
We  start  with  player  1. 

Proposition  4.37:  For  all  vectors  q  in  A  and  all  s  in 
q  •  fg  >  q  •  es  >  (val^A)(q) 

Proof:  Let  q  6  A  ,  and  consider  the  one-shot  zero-sum  game  A(q). 
By  definition  (3.l),  player  1  has  a  strategy  u  £  A1  such  that  for  any 
strategy  v  £  A^  of  player  2, 


(h.3&)  (val,A)(q)  <  l  l  u  v  £  q  A' (i ,  j )  . 

“  iSI  jGI  J  kQC 


Let  6H  have  positive  probability  (under  (c,t)).  Define  a  new 
s  s 

strategy  o’  of  player  1  as  follows:  o'((hg,*);  k)  =  u  for  all  k,  and 

a'  equals  a  otherwise  (thus,  after  h  has  occurred,  player  1  makes 

s 

independent  randomizations  with  distribution  u  at  all  stages  and  all 
k).  By  (U. 38),  we  have  for  all  t  in  N,  t  >  s 


(val  A)(q)  <  E  (  £  qkAk( i  , J  ) | h  )  . 

*  kQC  z  s 


As  T  -*■  ®,  payoffs  before  s  become  negligible  in  aT,  and  we  have 
(by  (4.2),  (U. 3)  and  then  (4.1)): 


(val  A)(q)  <  L[E  ,  (  l  qkak|h  )] 

1  “  ,T  kQC  1  s 


-  I  AtE  (a£|h  >]  . 

kQC  O  ,T  I  S 


k  k 

Recalling  the  definition  of  e  (note  that  given  h  ,  E  ,  is 

S  S  (7  jT 

independt.it  of  k). 


(val  A)(q)  <  l  q  e  (h  )  =  q  •  e  (h  )  , 

1  "  kQC  S  s  s  s 


and  q  •  e  <  q  •  f  follows  from  ( U ,  35 )  . 


Q*E*  D» 
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Corollary  U . 39 : 


For  all  q  in  A 


K 


q  •  f  >  (val_A)(q)  a.s. 
oo  —  I 


Proof:  (4.34)  and  Propositions  3.l6(i)  and  4.37.  Q.E.D. 

We  consider  now  player  2. 

Proposition  4.40:  For  all  s  in 

6  >  (vex  val?B)(p  )  a.s. 

S  •—  <—  s 

K  2 

Proof:  For  each  q  in  A  ,  let  r  (q)  he  defined  in  the  same 
“  00 

way  as  r  (q),  hut  with  payoff  matrices  (-Bk),_v  instead  of 

°°  ifcK.  KfcK 

for  player  1.  This  is  a  zero-sum  repeated  game;  therefore  player  2 
(the  uninformed  player)  has  a  strategy  t  e  r(q)  such  that 

l  T 

lim  sup  E  -  (-  l  (-BK(i  ,j  )))  <  (cav  (val  (-B)))(q) 

T->oo  °  »  1  1  t=l  X,  T.  -  ± 


=  -(vex  valgB ) ( q ) 

A 

for  all  o'  (cf.  Aumann  and  Maschler  [1966]  —  t  may  he  taken  to  he 
the  corresponding  Blackwell  strategy;  "cav"  is  the  concavification  of  a 
function,  and  we  use  (3.2)).  Thus, 

(4.41)  lim  inf  E  ;  (0  )  >  (vex  val-B)(q)  . 

Let  h  S  H  have  positive  prohahility  under  (o,r),  and  consider 

s  s 

the  following  strategy  t ’  of  player  2:  after  hg  has  occurred,  t’  is 


x(p  ),  where  p  =  p  (h  )  is  the  vector  of  posterior  probabilities 
s  s  s  s 

given  h  ;  otherwise ,  t '  equals  t .  Let  T  >  s ,  then  we  condition  on 

s  ™ 

H  ,  to  obtain 
s 

E(Bt)  -  E'(6t)  =  P(hs)(E(6T|hs)  -  E'(BT|hs))  , 

where  E'  =  E  .  (up  to  stage  s  —  no  difference  between  E  and 
0  ,t  ,p 

E';  afterwards— only  if  h  has  occurred).  Apply  the  Banach  limit  L 

s 

as  T  -*■  <*> ;  since  (a  ,r )  is  an  equilibrium  (see  (4.9)),  we  get  by  (4.1) 
0  <  P(hs)(L[E(0T|hs)]  -  L[E’(BT|hs)])  , 


hence 

6s(hs)  *  L[E(8T|hg)]  >  L[E'(BT|hs)] 

(since  P(hg)  >  0).  By  (4.3)  and  (4.4l)  (with  o'  =  a;  note  that 
payoffs  up  to  s  do  not  matter  as  T  ®),  the  proof  is  completed.  Q 

Corollary  4.42:  6^  >  (vex  val^BMp^)  a.s. 

Proof:  (4.15),  (4.19)  and  Propositions  3.l6(ii)  and  4.40.  Q.E. 

4.8  The  G-Process 


We  have  thus  completed  the  proof  of 


Proposition  4.43:  Let  (a,g)  be  the  payoffs  of  an  L-equilibrium 


point  (o,t)  in  ^(p).  Then  there  exists  a  G-process  starting  at 
(a,e,p) . 


Proof :  The  probability  space  is  ^  );  the  sequence  of 

fields  is  {H  }  ,  and  the  G-process  {g  }  is  given  by 

S  StzJN  ^  S  SfcrJ  ^ 

gg  =  (fg,6s,pg).  All  the  required  properties  are  indeed  satisfied: 
g  =  (a,3,p)  by  (4.13),  (4.l8)  and  (4.32);  the  limit  g  =  (f  ,6  ,p  ) 

(see  (4,15),  (4.19)  and  (4.34))  belongs  to  the  set  G  a.s,  by  (4.22), 
(4.36)  and  Corollaries  4.25,  4.39  and  4.42;  and  finally  the  "bi"  property 
(3.15)  is  given  in  (4.l4)  and  (4.33).  Q.E.D. 

5.  From  Martingale  to  Equilibrium 

This  section  is  devoted  to  the  proof  of  the  second  half  of  our 
result;  namely,  given  a  G-process  the  corresponding  uniform  equilibrium 
point  is  constructed. 

Let  g  =  (a,3,p)  belong  to  G*.  Thus,  we  are  given  a  probability 

1°/  00 
space —  (Z,Z,Q),  a  non-decreasing  sequence  (Z  }  ,  of  finite  subfields 

n  n=l 

oo  00 

of  Z,  and  a  G-process  {g}  ,  =  {(f  ,5  ,p  )}  ,  with  respect  to 

n  n=l  n’  n  rn  n=l 

00 

(Zn>n_^,  starting  at  g;  i.e., 

(5-1)  (f1,61,P1)  =  (a,S,p)  Q  -  a.s.  . 

Without  loss  of  generality,  we  will  assume  that  Z  is  the  trivial 
field  (Z,$).  Let  g  =  (f  ,6  ,p  )  be  a  Q  -  a.s.  limit  of  g  as 

OO  00  00  GO  n 

n  ■+•  <»;  then  g  €  G  a.s.  .  We  will  find  it  useful  to  weaken  the 


bi-property  (3.15)  to  the  following: 

(5.2)  Sf  -  f  !l  •  Up  ..  -  p  II  =  0  a.s.  for  all  n  =  1,2,...  . 

n+1  n  n+1  n 

This  means  that  on  each  atom  of  Z  ,  either  f  , ,  is  constant  (and 

n  n+1 

thus  equals  f  ),  or  ?n+i  constant  (and  equals  Pn)‘>  however,  which 

one  of  the  two  is  true  may  differ  from  one  atom  to  the  other.  It  is 

easy  to  see  that  G*  does  not  change  (to  obtain  (3.15)  from  (5.2), 

insert  between  each  Z  and  Z  ,  an  additional  field  Z  ,,  ,  and  put 

n  n+1  n+V 

8  .1  =  g  if  f  . ,  —  f  and  g  ^  =  g  otherwise). 
n+*5  n+1  n+1  n  n+%  Dn 

5.1  Standard  G-Process 

To  simplify  the  construction  of  the  equilibrium  point,  we  will 
work  with  a  G-process  having  the  following  additional  property: 

(5.3)  For  every  atom  z^  of  Zn  there  are  exactly  two 

atoms  z'  and  z"  of  Z  .  contained  in  z  , 

n+1  n+1  n+1  n 

and  <K*;+1l«n>  =  «(^nUn)  -  1/2  • 

Such  a  G-process  will  be  called  standard. 

Proposition  5-^:  For  every  g  in  G*  there  exists  a  standard 
G-process  starting  at  g. 

Proof:  We  will  show  how  to  "transform"  any  G-process  into  a 


standard  one . 


-5 


00  oo 

Given  a  G-process  (g  }  with  respect  to  {Z  }  .  .  we  can 

r  n  n=l  n  n=l 

describe  the  sequence  of  fields  as  a  "probability  tree"  as  follows. 

The  nodes  in  the  n-th  layer  are  the  atoms  of  Zn;  the  root  (i.e.,  the 
first  layer)  can  be  taken  to  be  Z  (by  (5.1)).  A  (directed)  arc 
leads  from  an  atom  z„  of  Z  to  an  atom  z  of  Z  if  and  only  if 

n  n  mm 

m  =  n  +  1  and  z  =  z  ,  C  z  .  We  associate  the  orobability 
m  n+1  n 

Q(zn+1|zn)  to  this  arc  and  define  the  probability  of  a  finite  path 

starting  at  the  root  to  be  the  product  of  the  probabilities  of  all  its 

arcs.  This  clearly  equals  Q(z  ),  where  z  is  the  endpoint  of  the 

n  n 

path.  This  probability  distribution  is  then  uniquely  extended  in  a 
standard  way  to  all  infinite  paths  in  the  tree  starting  at  the  root ; 
we  will  denote  this  probability  measure  also  by  Q.  This  completes 
the  description  of  our  probability  tree. 

OO 

The  G-process  can  now  be  regarded  as  being  defined 

on  the  nodes  of  the  tree;  we  will  write  g  (z  )  for  the  value  of  g 

n  n  n 


on  the  atom 

z  of  Z  . 
n  n 

The  properties  ( 3.12)-( 3.1*+ ) 

and  (5.2) 

defining  a 

G-process  become: 

(i) 

gl(*l)  =  g. 

(ii) 

E<Vl(Vl)|z 

succeeds  z  )  =  g  (z  ) 
n+1  n  n  n 

for  all  z  . 

n 

( iii ) 

The  sequence 

{g  (z  )}°°  ,  converges  for  almost  all 
n  n  n=l 

infinite  paths 

,  and  the  limit  g^  belongs 

to  G  a.s. 

(iv) 

For  each  node 

z  ,  either  f  (z  )  =  f  ( 

n  n+1  n+1  n 

z  )  for 
n 

all  successors 

z  .  of  z  ,  or  p  , ( z  _  ) 
n+1  n  *n+l  n+1 

=  p  (z  ) 
n  n 

for  all  succes 

sors  z  of  z  . 

n+1  n 

for  all  successors  z 


of  z  . 
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In  order  to  obtain  property  (5.3),  ve  need  two  kinds  of  modi¬ 
fications  of  the  tree — and  thus,  of  the  G-process.  First,  we  make  the 
number  of  (immediate)  successors  of  each  node  exactly  two;  and  second, 
we  make  the  probability  of  every  arc  precisely  1/2. 

For  the  former,  we  have  two  cases.  If  there  is  only  one  successor 

z  ,  of  z  ,  we  can  add  am  additional  copy  of  the  whole  subtree  starting 
n+1  n 

at  z  ,,  ,  and  thus  obtain  two  successors  z  , ,  and  z'  (which  is 
n+1 *  n+1  n+1 

identical  to  z  — and  moreover  with  probability  1/2  each  (from  z  ). 

n+1  n 

Now,  asstime  z  has  more  than  two  successors,  say  {zrj.}m  ,  .  We  then 
’  n  n+1  r=l 

introduce  additional  nodes  in  between;  e.g.,  at  level  n+1  we  will 

1  2  TT1 

have  z  and  the  union  of  z  z  ;  from  the  latter,  at  level 

n+1  n+1  n+l 

n  +  2  we  will  have  z2  and  the  union  of  z\,  , ...,zm.  .  ;  and  so  on. 

n+l  n+i  n+l 

The  probabilities  of  the  new  arcs  will  be  defined  as  the  corresponding 
conditional  probabilities ;  the  value  of  the  G-process  at  the  new  nodes , 
as  the  conditional  expectation.  As  an  example,  see  Figure  2;  the  value 
of  the  G-process  at  the  new  node  will  be 

2  3 

1  ,  2  »  AJ  ,  3  x 

X2  ♦  x3Vl(Vl>*  +  x3  ®n+l  n+1  ' 

Clearly,  all  four  properties  (i)-(iv)  continue  to  hold  after  such  modi¬ 
fications. 

Next  we  have  to  make  the  probabilities  of  all  arcs  precisely 

1/2.  Let  z'  and  z"  be  the  two  successors  of  z  ,  and  let  A' 
n+1  n+1  n 
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Fi  glare  2 


z 


n 


z 


1 

n+1 


z 

n 


and  A  =  1  -  A'  be  the  corresponding  probabilities.  We  want  to 
obtain  z^+1  with  probability  A'  and  z^+1  with  probability  A" 
by  using  the  probability  1/2  only.  This  is  done  as  follows:  we 
express  A '  as  a  binary  fraction 


x'  -  X  . 


m=l  2m  m 


with  A^  =  0  or  1  for  all  m.  We  then  consider  an  infinite  sequence 

of  independent  Bernoulli  trials,  with  "success"  and  "failure"  having 

probability  1/2  each,  up  to  the  first  occurrence  of  "success".  If  this 

happens  after  m  trials,  then  z'  "results"  if  A  =  l  and  z" 

n+1  m  n+1 

"results"  if  A^  =  0.  Thus,  the  total  probability  of  Precisely 

A*  (since  the  first  "success"  occurs  at  the  m-th  trial  with  probability 


1/2  ),  and  that  of  z^+^  is  X".  This  structure  now  replaces  the 

original  randomization  between  z'  and  z"  ,  in  the  tree.  As  an 

n+1  n+1 

example,  see  Figure  3  (note  that  X'  =  2/3  gives  X  =1  for  m  odd 

m 

and  X^  =  0  for  m  even).  Again,  the  value  of  the  G-process  at  a  new 
node  is  the  corresponding  expectation. 
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With  probability  one,  either  z’  or  z"  will  be  reached. 

n+x  n+l 

If  we  do  this  modification  at  all  nodes,  the  properties  (i)-(iv)  will 
not  be  affected  (there  are  only  countable  many  nodes,  hence  the  proba¬ 
bility  of  "success"  not  occurring  in  even  case  is  still  zero).  Q.E.D 

Henceforth  we  will  assume  that  the  G-process  we  start  with  is 
already  standard. 

5.2  The  Sequence  {8^1 

The  limit  of  the  G-process  belongs  to  G  a.s.;  by  (3.9), 

a  corresponding  point  in  F  is  thus  obtained — and  from  it,  a  point 

lx  J 

in  the  set  A  of  "feasible  joint  actions". 

Ty  T 

For  0  =  (0(i,j))iei  in  A  and  k  in  K,  we  will 

denote 

A*(6)  =  l  l  0(i,j)Ak(i,j)  , 

iGI  JGJ 

and  A(0)  =  (A  (0))^;  similarly  for  B. 

JXJ 

Proposition  5.5:  There  exists  a  A  -valued  random  variable 
0w  satisfying  Q  -  a.s. 


(5.6) 


f  >  A(0  )  and  p 


f  =  P„ 


a (e  ) 

00 


* 


(S.7) 


A  =  n  •  Bffi  ) 


Proof:  By  definition,  g  E  G  implies  the  existence  of  (c  ,4  )  G  F 

-  OO  00  00 

satisfying  f  >c,p  •  f  =p  •  c  and  6  =  p  •  d  .  Since  F 

*  00  —  00  *  OO  00  -^00  00  OO  00  OO 

TXJ 

is  precisely  the  set  of  (A(0),B(0))  for  all  0  in  A  ,  there  is 

Tx  J 

9  in  A  such  that  c  =  A(0  )  and  d  =  B(0  ).  The  measurability 

00  00  00  OO  00 

is  obtained  by  the  Measurable  Selection  Theorem  (e.g.,  see  Hildenbrand 
[197^] ) .  Q.E.D. 

Proposition  5-8:  There  exists  a  sequence  ^®n^n=i 

lx  J 

A  -valued  random  variables,  satisfying  Q  -  a.s.  for  all  n  in  N 
and  (i  ,j  )  in  I  *  J : 

(5.9)  0  is  2  -measurable 

'  ^  n  n 

(5.10)  |en(i,J)  -  E(0-(i,j)|Zn)|  <  ^  • 

(5.11)  0  -*•  0  as  n 

n  ® 

(5.12)  n0n(i,j)  is  an  integer  . 

Proof:  Define  0  =  E(0  |Z  ),  then  {0  }°°  ,  forms  a  martingale 

-  n  »'  n  n  n=l 

converging  to  0  .  Choose  0  to  be  a  rational  approximation  to  0 

0  00  j)  n 

with  denominators  n  (e.g.,  let  0  *  ( i  ,  J  )  =  [n0  (i,j)]/n,  where  [x] 

denotes  the  largest  integer  not  exceeding  x,  then  0n(i,j)  is  either 

0  * ( i ,  j  )  or  0*(i,j)  +  1/n,  so  as  to  have  the  sun  equal  l).  Q.E.D. 

n  n 

5.3  The  Strategies  a  and  x 

We  can  now  define  the  pair  of  strategies  (o,x).  In  a  similar 
way  to  the  so-called  "Folk  Theorem"  for  repeated  games  with  complete 


information  (for  a  detailed  proof,  see  the  Lecture  Notes  of  Hart 
[1980,  Section  IV]),  they  are  based  on  a  master  plan  and  punishments . 

Each  player  follows  the  master  plan  as  long  as  the  other  one  does  it 
too  (at  least,  as  long  as  no  deviation  is  detected),  and  uses  the  cor¬ 
responding  punishment  otherwise. 

The  master  plan  consists  of  two  parts.  Stages  t  =  n!  ,  for 
all  n  =  1,2,...,  are  communication  stages;  the  moves  made  serve  as 
a  mean  of  transmitting  information  (from  the  informed  to  the  uniformed 
player),  or  of  making  a  joint  decision.  All  the  other  stages  are  payoff 
periods;  well  determined  moves  (namely,  pure)  are  used  in  order  for  both 
players  to  accumulate  the  "right"  payoffs.  The  sequence  n!  was  chosen 
since  (n  -  l)!  is  negligible  relative  to  n!  as  n  goes  to  infinity — 
thus  only  the  last  period—1^  really  counts.  Any  other  sequence  with  the 
same  property  could  be  used  just  as  well. 

The  master  plan  is  derived  from  the  G-process.  The  moves  at 
stage  t  =  n!  correspond  to  the  arcs  from  z ^  to  zn+1  in  the  tree  (see 

subsection  5.1),  whereas  at  stages  (n  -  l)I  <  t  <  n!,  one  "stays"  at  z  . 

n 

Thus,  a  function  £  is  defined  inductively  from  the  set  of  finite 
histories  in  the  game  for  which  no  deviations  occurred,  to  the  set  of 
atoms  of  the  fields  {Zn}  — or,  equivalently,  to  the  set  of  nodes  in  the 
tree . 

Let  i'  ^  i"  be  two  elements  of  I,  and  j'  ^  j"  two  elements 

of  J,  fixed  throughout  the  remainder  of  this  section.  These  two  (pure) 

moves  for  each  player  will  actually  be  their  communication  alphabet 

12  / 

(thus,  they  essentially  "talk"  in  a  binary  language). — 
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s 


Let  t  =  n! ,  let  h^.  be  a  history  with  no  deviation  from  the 
master  plan,  and  let  z  *  s(h. )  be  the  corresponding  node  in  the  tree. 

n  w 

We  will  define  now  the  behaviour  of  each  player  at  stage  t,  and  also 

the  resulting  e(h,  . , ) .  If  p  . ,  =  p  and  f  =  f  at  both  nodes 
t+l  n+l  n  n+l  n 

z^+1  and  z^+1  succeeding  the  two  players  play  arbitrarily  at  t  =  n!, 

and  for  all  i  and  Jt,  c(ht+1)  =  z^+i*  say*  Otherwise,  we  distinguish 
two  cases: 

(i)  pn+l  4  Pn  (andthus  fn+l  =  fnK 
(ii)  fn+l#fn  (and  thUS  Vl  =  Pn}- 
In  case  (i),  we  define  for  each  k  in  K 


Pn+1 ^  zn+l )  ,  if  it  *  i '  , 


a(ht;  k)(it) 


=  \ 


2p^( z  ) 

*n  n' 


Pn+l(zn-H)  ,  if  it  -  i- 


2p^(z  ) 

*n  n 


,  otherwise 


Since  PnUn)  =  ^n+l^n+1^  +  pn+l^Zn+l^2  by  a(ht»k^  is  indeed 

a  probability  distribution  over  I.  As  for  player  2,  we  let  tCh^)  be 

arbitrary  in  this  case,  and  then  for  all  J  in  J,  we  put 

?(ht,i’,Jt)  =  z;+i  and  S(ht,i",Jt)  =  zj+1. 


Then 


Lemma  5-13:  Assume  that  P(k  =  k|h. )  =  pK(c(h. ))  for  all  k£K 
-  t  n  t 


pU(ht+i) 


Jn+1 


ht) “  P( ?(ht+1) 


Zn+llht} 


1 

2 


and 


P(k  kji.-..+1)  pn^^ht+i^' 


Proof:  Assume  i  =  i ’ ,  then  we  have 

U 


P(i  =  i  ’  I h  )  =  l  P(i  =  i'  i h  ,k 
t  t  kGK  1  t 


=  k)  •  P(k  =  k|h  ) 

"C 


v  pn+l^Zn+l^  k,  ^ 

L  - ? - pJO 

kGK  2pk(z  )  n  n 

r  n  r\' 


n  n 


1  v  k  ,  ,  >  1 

2kLPn*1  Vi  ' 2  ■ 


Therefore 


P(k  =  k|ht+1)  =  P(k  =  k|ht,it  =  i') 


_  P(it  =  i'|ht,<  =  k)  •  P(k  =  k|ht) 

P^t  =  i'|ht> 


i  k,  ,  •  W 

2p  ( z  ) 

n  n _ 

1 

2 


Pn+l^Zn+l^ 


Similarly  for  i  = 


Q  •  E  •  D  • 
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Thus,  if  the  posterior  probabilities  for  the  various  values  of  k 

at  stage  t  =  nl  are  pQ,  then  the  new  posteriors  generated  after  the 

moves  at  time  t  are  precisely  Pn+^‘  Case  (i)  therefore  corresponds 

to  a  transmission  of  information  (about  the  value  k  of  k)  from 

player  1  to  player  2;  we  will  henceforth  call  this  signalling  (by  player  l). 

In  case  (ii),  f  #  f  and  p  =  p  ;  we  define  for  all  k 
n+1  n  n+1  n 

in  K 


o(ht;  k)(it)  = 


if  it-i'  , 

if  it  =  i"  , 
otherwise  , 


n  ■ 


T(ht)(Jt)  =  < 


1 

2  ’ 


1°  > 


if  Jt  =  j'  , 
if  Jt  =  J"  , 

otherwise  , 


and  then  ?(ht,i',j') 


=  ?(h, ,i",j")  =  z 


n+1’ 


C(ht,i‘  J")  =  C(ht,i",j') 


=  7." 

n+1 


Lemma  5  - 1^* : 


P^ht+1^  *  Zn+l^ht^  ~  P^ht+1^  =  Vn^t*1^ 

=  P(c(ht+1)  =  zn+1lht»Jt)  =  2  ’ 


where  z  ,  stands  for  either  z’  or  z"  ,  ,  i.  for  i'  or  i", 
n+1  n+1  n+1  t 

and  J  for  j'  or  J". 
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Proof:  The  choices  of  i,  and  .1.  are  made  independently.  Q.E.D. 

Thus,  in  case  (ii)  a  lottery  with  probabilities  1/2,  1/2  is 
performed  among  z^+1  and  zjj+^.  Moreover,  no  player  has  any  control 
over  the  outcome — whichever  of  his  two  possible  moves  he  chooses,  the 
probabilities  are  the  same  (1/2,  1/2).  Therefore,  this  is  called 
(following  Aumann,  Maschler  and  Stearns  [1968])  a  jointly  controlled 
lottery. 

This  completes  the  definition  of  the  master  plan  for  t  =  n! 

(the  communication  stages).  It  corresponds  to  advancing  one  step  in  the 

tree  (from  z n  to  zn+^)-  We  next  consider  the  payoff  periods .  Let 

z  =  t(h/  , . ,  « )  (thus,  we  are  just  after  z  was  determined  at  stage 
n  \ n-i ; i +1  n 

lx  j 

(n  -  1)!).  Let  0  =  0  (z  )  in  A  be  given  by  Proposition  5-8 

n  n  n 

(see  (5-9)).  At  stages  (n  -  l)!  +1  through  n!  -  1,  the  players  will 
play  by  frequencies;  namely,  the  pair  (i,j)  will  be  played  @n(i,j) 
of  the  time.  Since  all  the  denominators  are  n  by  (5.12),  this  cam  be 
done  in  cycles  of  length  n  each.  For  example,  assume  8  (i,,t)')  =  1/n, 
0n(i",j")  =  (n  -  l)/n  and  0 n ( i , J )  =  0  otherwise,  then  player  1  plays 
i'  once  (at  t=(n-l)!+l),  then  n  -  1  times  i"  (at 
t  =  (n  -  l)!  +  2,  .  .  .  ,  (n  -  l) !  +  n),  repeating  this  n-stage  cycle  up  to 
(and  including)  t  =  n!  -  1;  as  for  player  2,  he  chooses  j'  at 
t  =  ( n  -  1 ) !  +  1  and  j"  at  t=(n-l)!+2,...,(n-l)!+n,  and 
so  on.  Clearly,  we  put  c(h  )  =  z  for  all  (n  -  l)!  <  t  <  n!,  when  the 
two  players  play  as  described. 
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We  introduce  the  following  notation:  for  every  i  in  I,  J  in 
J  and  u,  v  in  N  with  u  <_  v,  let 

(5.15)  ♦J(i.J)  =  —  ~  —  1  |{t^N  :  u<  t  <v  ,  it  =  i  ,  =  J  >  [ 

Thus,  <j/(i,j)  is  the  frequency  that  the  pair  of  moves  (i,j)  was 
used  at  stages  u, u  +  1,  ...,v.  Note  that  it  is  H^^-measurable . 

Lemma  5.1 6 :  Let  tGN,  ( n  -  1 ) !  <  t  <  n ! .  Then ,  for  all  i 
in  I  and  j  in  J, 


n  -  1 


t  -  (n  -  1) ! 


Proof :  Every  n  stages,  the  frequency  0^  is  precisely  obtained. 
The  inequality  follows  by  ignoring  the  (at  most)  n  -  1  stages  following 
the  last  complete  n  cycle.  Q.E.D. 

Finally,  we  have  to  define  the  punishments — what  each  player  does 
after  detecting  a  deviation  from  the  master  plan  by  the  other  player. 

Two  results  are  needed  from  the  theory  of  zero-sum  games  (see  Propositions 
3.5  and  3*6;  the  more  precise  statements  here  are  needed  to  obtain  a 
uniform  equilibrium) . 

k 

Proposition  5.17:  Assume  the  vector  y  in  R  satisfies  (3.3). 
Then  player  2  has  a  strategy  t  =  f(y)  such  that 

Jt  /  k.  „  k  2M 
ET  _( a*)  <  y  +  —  , 

o',t  /r 

for  all  strategies  o'  of  player  1,  all  k  in  K  and  all  T  in  N. 
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Proof:  The  precise  bound  is  obtained  from  the  proof  of  the 
approachability  theorem  (cf.  Blackwell  [1956],  or  Mertens  and  Zamir 
[1980,  Ch.  I]).  Q.E.D. 

Proposition  5.18:  For  every  q  in  A  ,  player  1  has  a  strategy 
a  =  o(q)  such  that 

E_  (ST)  1  (vex  val^B) (q) 
o,T',q 

for  all  strategies  t'  of  player  2  and  all  T  in  N. 

Proof:  The  above  inequality  actually  holds  with  B^i^jj,  )  instead 

“  Ty  U 

of  3t,  for  all  t  (e.g.,  see  Mertens  and  Zamir  [1980,  Thoerem  3-15]).  Q.E.D. 

The  definition  of  a  and  t  can  now  be  completed.  Assume  first 

that  player  1  deviated  from  the  master  plan,  either  by  playing  i  4  i',i" 

at  some  t  =  n!  or  by  not  playing  the  "right"  i  at  some  (n  -  l)S  <  t  <  n!. 

Let  D  be  the  stage  at  which  this  deviation  of  player  1  occurred. 

Thus,  all  moves  in  hQ  are  according  to  the  master  plan,  and  iQ  is  the 

deviation  move  (which  is  observed  by  player  2  before  stage  D  +  l).  Let 

z  =  c(h_)  be  the  corresponding  node  just  before  the  deviation;  the 
n  D 

strategy  t  prescribes  then  that  after  h^^  (i.e.,  from  stage  D  +  1  on), 

player  2  should  use  x(y)  with  y  =  f  (zn)  (See  Proposition  5-17,  and  note 
that  (3.3)  is  satisfied  in  view  of  Proposition  3.l6(i)). 

Next,  assume  player  2  deviated  from  the  master  plan  at  stage  D 
(and  was  detected).  From  stage  D  +  1  on,  player  1  then  uses  o(q) 
with  q  =  Pn(zn)  as  defined  in  Proposition  5*18  (again,  =  c(h^)). 

This  ends  the  definition  of  the  pair  of  strategies  a  and  t. 
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5.^  Payoffs  and  Probabilities  for  (<j,t) 

In  this  subsection  we  assume  that  both  players  use  a  and  t, 
respectively.  Thus,  only  the  master  plan  matters;  there  are  no  derivations 
and  no  punishments. 

We  first  analyze  the  payoffs.  Let  TEN,  (n  -  l)l  <  T  <  n! , 

and  let  h^  in  be  a  history  possible  under  (o,t);  i.e., 

Pa  t  p^bf)  >  °*  We  wil1  write  en  for  9n(5(h^) )  — the  value  of  9n 

on  the  atom  s(h_)  of  Z  ;  similarly  for  the  other  random  variables 
J-  n 

defined  on  Z.  Recalling  definition  (5*15)  of  the  frequencies  <f>,  we 
have 


Proposition  5.19=  Let  T  €  N,  (n  -  l)!  <  T  <  n! ,  h^,  G  with 
P  _  _(h„)  >  0.  Then,  for  all  i  in  I  and  3  in  J, 

-  [(1  -  V-Vv1”)’ +  (T  -111 Vl(1’j)11  :  ? 

Proof :  If  k/n  >  1,  there  is  nothing  to  prove  (both  <f>  and  the 
expression  [...]  lie  in  the  interval  [0,l]).  Let  n  >  5,  then 


T-l  _  (n  -  1)!  (n-l)J  .  (T  -  l)  -  (n  -  l)!.T-l 
♦l  "  T-l  *1  +  T-l  *(n-l)!+l 


The 


frequency  0n  is  "played"  at  stages  (n  -  2)i  <  t  <  (n-l)!. 


|*[n'1)!(i,j)  -  en_1(i ,J ) ( 


< 


(n-2)!_+l 

(n-lTl 


therefore 
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(5.6)  Ak(eJ  <  f*  hence 


Similarly  for  0^  and  we  obtain 


.k,.T-l^  ,  .  Jt  .  .  5M 

A  (*1  }  1  XTfn  +  XT  n-1  +  n  ’ 


from  which  the  result  follows , 


Q.E.D. 


Corollary  5.21:  For  all  q  in  A  , 


T_1  liM 

q  •  B(^1' )  <.  max  {q  •  B(0n),q*  8(0^)}  +  — 


Proof:  Immediate  from  Proposition  5.19. 


Q.E.D. 


Next  we  deal,  with  the  probability  P  and  the  induced 

0  ,T  ,p 


posteriors . 


Proposition  5.22:  Let  n  6  N,  zn  an  atom  of  and  T  €  N, 

(n  -  l) !  <  T  <  n! .  Then: 


<5-23)  p0,t,p(5(v  -  y  -  *(*n> 


(5.2H  P0>T  «('  =  klh^,)  "  pn^zn*  for  411  kj  in  “t  "ith  t^hT*  =  2n 


and  all  k  in  K. 


Proof:  Induction  on  n.  For  n  =  1,  there  is  only  one  history  h1 
(the  "empty"  history),  thus  (5*23)  is  just  1  =  1  and  (5.2U)  is  p  =  P-^ 
(recall  (5.1)).  The  induction  now  proceeds  as  follows. 


At  stages  (n  -  1)!  <  t  <  n!  (payoff  stages),  neither  £  nor 
any  probabilities  change  (both  players  make  pure  choices).  At  t  =  n!,  the 
probabilities  of  S(ht+1)  =  z^+1  or  z^+1  are  1/2  each  by  Lemmata  5-13 
and  5 -l1*,  thus  equal  to  Zn+;J  Zn^  =  ^Zn+ilZn^  by  (5-3)  (recall  that 
our  G-process  is  now  assumed  standard).  As  for  the  posterior  probabilities 
(of  k  =  k),  they  change  only  when  there  is  signalling  (t  =  n!  and  case 
(i)) — and  we  use  again  Lemma  5.13.  Q.E.D. 

We  will  now  show  that  (o,t)  result  in  the  payoffs  (a, 3). 

We  need  first  the  following  result  is  the  conditional  expectation 

given  k  =  k,  and  %  -  ^<V>>- 

Proposition  5-25:  Let  T  6  N,  (n-  1)!  <  T<  n!  and  k  E  K. 

Then 


E*  (f*)  =  ak  . 
a  ,x  n 

Proof:  The  probability  distribution  of  z n  =  ?(h,p)  induced 

by  P  is  precisely  Q  (by  (5.23));  therefore 

p  *P 


E 

a 


J(z) 


where  E(z) 
note  that  E 
and  f.  =  a 


denotes  expectation  on  the  space  Z 

and  E^  are  on  ft ) .  Since 
a,T,p  o,t 

by  (5.1)*  the  above  equals  a  . 


(with  respect  to  Q; 


{f*} 
n  n 


is  a  martingale 


-67- 


We  claim  that  the  same  expectation  is  obtained  when  using  P11 

O  ,T 

instead  of  P  .  Indeed,  the  induced  probability  distributions  over 
the  tree  differ  only  in  case  of  signalling  (in  a  Jointly  controlled 
lottery,  it  is  1/2  in  both  cases  by  Lemma  5-1^);  however,  in  that  case 


f“m+1  =  f  ,  so  that  the  expectation  is  the  same. 


Q.E.D. 


Remark :  Actually,  the  conditional  expectations  (with  respect 

to  P  and  Pk)  are  also  the  same — thus,  {f*}  is  a  martingale  also 

n  n 

1c 

with  respect  to  the  probability  distribution  induced  by  P  .  Moreover, 
any  strategy  a'  of  player  1  that  differs  from  a  only  in  the  proba¬ 
bilities  used  for  signalling  has  this  property  (as  we  shall  see  in  the 
next  subsection). 


Proposition  5.26:  For  all  k  in  K 


h f.uSi-i*  . 


0  ,T  T 


Eo.t,p(8t)  =  8  • 


Proof:  We  start  with  player  2.  Let  (n  -  l)S  <  T  <  n! ; 

conditioning  on  H_,  we  obtain  (E  =  E  ): 

T  a  ,t  ,p 


E(8  \l^)  =  l  pkBk(^_1)  , 

i_x  k€K  n  x 


V  Ir 

with  p^  =  Pn(?(hT))  as  usual.  By  Proposition  5-19, 
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bX^-^bXj.x-bX-i)]!^  . 


4m 


where  X,j,  =  (n  -  1) ! /(T  -  1)  and  XT  =  1  -  X^.  As  in  the  proof  of 
Proposition  5.25,  the  distribute 
is  Q  (see  (5.23)).  Therefore, 


Proposition  5.25,  the  distribution  of  z  =  c(h  )  induced  by  P  =  P 

n  T  J  a,t,p 


lE<6T-l>  -  E(Z)(p„  •  [XTBk(e„>  +  iiBk<en-l)1)l  i  ¥ 


As  n  -*•  «,  0n  -*■  0^  and  0^  -»■  0^  Q  -  a.s.  by  ( 5 . 11 ) ;  also  pn  -v  p^, 
hence  (X^X^  >  0,  X^  +  =  1  and  everything  is  bounded): 

lim  E(0  )  =  E(  }(Poo  •  B(0J)  . 

T+oo  '  ' 

By  (5-7),  thi  i  is  E(  z )  ( 6ra )  =  ^  =  3  (recall  (5.1)). 

For  player  1,  the  same  argument  gives  (aT  was  defined  in  (4.l6)) 


lim  E(aT_1)  =  E(z)(Paj  •  A(0j)  = 
(see  (5.6),  Proposition  3.18  and  (5.1)). 


E(Z)(p»  *  fJ  "  pi  *  f!  =  P  *  a 
If  we  condition  on  c ,  we  have 


I  pV(4  , ) 

kSC 


J 


where  E1*  =  E* 

CT,T 


E  (• 
a  ,x  ,p 


k).  As  in  the  proof  of  Corollary  5.20, 


we  obtain 


By  Proposition  5-25 


Ek( 


k  » 
aT-l 


iy 


+  X  '  a  +  — 


5M 

n 


hence 


lim  sup  E^(a^)  <  ak  , 

[p>«.  1 

which  together  with 

lim  l  pkEk(^)  =  l  pkak 
kQC  k€K 

Ir 

and  p  >  0  for  all  k  completes  the  proof.  Q.E.D. 

Remark :  The  above  proof  actually  shows  that,  if  both  players 

use  (ct,t),  then  the  average  payoffs  converge  a.s.  to  the  corresponding 

0  =  lim  0  ,  where  0  =0  (?(hm)).  Therefore,  one  can  interchange 

«  n  n  n  T 

n-*» 

the  order  of  limit  as  T  -*■  <*>)  and  expectation  (and  there  is  no  need  for 
Banach  limits ! ) . 

At  this  point  we  can  show  intuitively  that  (0,1)  is  indeed  an 
equilibrium  point  (at  least — in  the  weak  sense  (2.7)-(2.8);  the  compli 
cated  inequalities  in  the  next  two  subsections  are  in  part  due  to  the 
fact  that  we  want  to  prove  the  uniform  property  (2.11)- (2. 12) ) . 
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Consider  player  1,  and  fix  k  in  K  (the  true  game).  As  we 
noted  in  the  last  Remark,  the  average  payoff  to  player  1  for  infinite 
histories  without  deviation  from  the  master  plan  is  the  corresponding 
A^(0  ),  which  is  <  f^.  If  the  game  has  proceeded  up  to  a  point 

oo  _  oo 

corresponding  to  the  node  z in  the  tree,  his  expected  payoff  will 

thus  be  at  most  E^(r  |z  )  =  f^  (see  Proposition  5.25  and  the  Remark 

00  n  n 

following  it).  This  will  also  be  the  expected  payoff  if  player  1  decides 

to  deviate  now — by  Proposition  5-17.  This  shows  that  he  cannot  gain  by 

detectable  deviations.  How  about  undetected  ones?  He  can  only  make 

those  at  communication  stages  (at  payoff  stages,  the  moves  are  pure). 

If  a  jointly  controlled  lottery  is  performed,  he  cannot  influence  the 

outcome — the  two  alternatives  have  probability  1/2  each  no  matter  what 

player  1  chooses  (since  player  2  randomizes  truly  according  to  r). 

If  he  is  in  a  signalling  case,  then  f^+1  =  thus  f^+1  is  constant, 

and  any  "signal"  he  uses  gives  him  the  same  expectation.  Therefore 

undetectable  deviations  do  not  help  either,  and  a  is  optimal  gainst  t. 

Consider  now  player  2.  Since  player  1  uses  o,  the  posterior 

probabilities  are  given  by  pn>  Therefore  the  expected  average  payoff 

of  player  2  at  a  node  z  — if  he  does  not  deviate — is  precisely 

n 

p  •  E(B(0  )|z  ),  which  for  n  large  enough  is  close  to  E(p  •  B(9  )[z  ) 
n  00  n  00  00  n 

=  E(6  |z  )  =  <S  (since  p  -*■  p  ) .  If  he  makes  a  detectable  deviation, 

00  n  n  n  00 

he  will  get  thereafter  at  most  (vex  val„B)(p  )  <  S  — thus  he  cannot 

2  *n  -  n 

gain  by  doing  so.  The  only  other  possible  change  in  strategy  is  in  a 


jointly  controlled  lottery;  again,  if  player  1  uses  a,  player  2  cannot 


influence  the  resulting  probabilities.  Thus  x  is  a  best  response 
against  a . 


M  ' 
* 


5.5  0  is  Optimal  Against  x 

We  will  show  here  that  a  is  a  best  response  of  player  1  against 
the  strategy  x  of  player  2.  Moreover,  the  uniform  condition  (2.11) 
will  be  proved. 

Thus,  let  e  >  0;  we  have  to  find  T  =  T  (e)  such  that  for  all 

00 

T  >  T  and  all  o’ 
o 

/  k,  k  . 

,  (aj)  <  a  +  e 

for  all  k  in  K  (see  (2.13)  and  Proposition  5-26). 

As  usual,  P,  P^,  E  and  refer  to  a,x,p,  whereas  P' ,  P'k, 

E'  and  E'  to  cr',x,p. 

If  both  players  use  (o  ,x ) ,  there  are  no  deviations  from  the 
master  plan,  and  x,  is  defined  for  all  possible  histories  (i.e.,  those 
with  positive  P).  However,  when  we  consider  alternative  strategies,  it 
will  be  useful  to  define  x,  for  all  histories  (i.e.,  even  those  that 
are  not  possible  under  (o,x));  c(ht)  will  be  ?  of  the  part  of  h^ 
up  to  the  first  stage  a  (detectable)  deviation  occurred.  Thus,  we 
define 

(5.27)  D  =  sup  (t  €  N  :  P(ht)  >  0)  . 


E* 

a 
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D  is  a  random  variable  on  ft  with  values  in  N  U  {<*>} ,  and  is  H  - 
measurable.  For  every  infinite  history,  D  is  the  stage  of  the  first 
detectable  deviation,  if  any;  D  =  °°  otherwise.  Note  that  PCh^)  >  0 
just  means  that  the  sequence  of  moves  used  by  both  players  at  stages 
1,2,...,  t  -  1  is  possible  under  (o,t);  more  precisely,  that  h  is 
possible  under  (o,t)  when  k  =  k  for  some  k  in  K. 

Let  DAt  =  min  {D,t},  then  we  define  for  all  t  in  N  and  h^ 

in  H 

(5-28)  ^(ht)  =  ^hDAt^ 

The  right-hand  side  was  given  in  subsection  5-3;  we  thus  extend  the 
definition  of  C  to  all  histories. 

Next,  we  "translate"  the  G-process  to  the  space  ft,  as  follows: 

<5'29)  “MV  5  «mUm)  ’ 

where  z  =  ?(h.  )  (and  thus,  by  (5.28),  we  have  (m  -  l)!  <  DAt  <  ml), 
m  t  - 

As  usual,  gt  =  (ft,5t,pt),  with  K  ’  In  rK’  K  ln  R  and 

/k,  .  K 

Pt  =  Pt  kEK  ln  A  • 

Proposition  5.30:  Let  k  £  K.  The  sequence  {f^}  ^  is  a 

martingale  on  (ft,H  ,P,k)  with  respect  to  {H  }”  .  Moreover,  for  all 

oo  z  Z—± 

t  in  N,  h  in  H  and  all  i  in  I, 

"C  T.  Lr 

(5.31)  ^ 

P'k-a.s.  . 


This  proposition  is  a  crucial  assertion  in  our  proof.  (5.31) 
means  that  the  strategies  a  and  t  have  been  constructed  in  such  a 


way  that  player  1  is  indifferent  among  his  various  choices  of  i  at 
all  histories  h  — and  this  includes  both  detectable  and  undetectable 

X 

deviations  from  a  (see  also  Proposition  5.25  and  the  subsequent  Remark) 


Proof :  The  measurability  of  f^  with  respect  to  H  is 

X  X 

immediate  by  definition.  As  for  the  martingale  property,  namely 

it  will  follow  from  (5.31)  (which  is  stronger,  since 


it  holds  for  all  i  in  I ,  not  only  in  the  average ) . 

X 


We  now  prove  (5.31).  It  is  easy  to  see  by  (5.29)  that 


only  when  DAt  =  m!  and  DA(t  +  l)  =  m!  +1,  and  thus 

t+1  t 


D  >  t  +  1  and  t  *  m! .  Let  z  =  5 (h. ) ;  since  f^,  ^  f\  case  (ii) — 
-  m  t  m+x  m 


a  jointly  controlled  lottery — occurs  at  z  .  But  i  must  be  either 

m  x 


i'  or  i"  (otherwise,  player  1  deviated  and  D  =  t);  in  both  instances, 

with  probability  1/2  each  by  Lemma  5.lU, 


is  or  z 


t+1  m+1 

and  (5 .31)  reduces  to 


?» 

m+1 


m 


z  ) 

m 


=  —  f^ 

2  m+1 


(z;+i}  + 


-f11  (z"  ) 

2  m+1  um+l; 


which  holds  by  (5.3).  Q.E.D. 

For  each  T  in  N,  we  define  tf  to  be  the  finite  field 

13/ 

generated  by  all  events  of  the  form — 


(h,  and  DAT  >  t} 


for  t  in  N,  t  <  T  and  h^  in  H^.  This  is  the  field  of  events 
prior  to  the  first  detectable  deviation.  Note  that  D  +  1  (but  not 
D)  is  a  stopping  time  relative  to  the  sequence  {H }"  ,  and  so  is 

o  T< — X 

(D  +  l)A(T  +  1)  »  (DAT)  +  1.  The  field  of  events  strictly  before 
(DAT)  +1  is  precisely  H  ^  .  It  is  easy  to  see  that  C 

and  an  atom  in  which  we  denote  by  h^^,  is  of  the  form 

hDAT  =  {ht  and  DAT  =  t} 
for  some  t  in  N,  t  <  T  and  h  in  H  . 

""  "t  X- 

From  now  on,  we  fix  a  strategy  a'  of  player  1,  an  element  k 
in  K,  and  T  in  N.  To  shorten  notation,  we  will  write  D  for  DAT 
and  Hd  for 

1c  k 

Consider  E'  (a^  we  separate  it  into  three  parts:  before  D, 
at  D,  and  after  D  (note  that  only  the  first  or.e  is  always  non-empty) 
Thus , 


k  _  D  -  1  _k  .  1  .k, .  ,  v 

&T-1  T  -  1  aD-l  T  -  1  "  1D’^D 

+  T  -  1  JL  A  (it’V  * 


The  middle  term  is  at  most  M/(T  -  l),  hence 


s'k<4-i>  *  +  tvt 

tE'k(TTTjlAk(ifJt» 


(5.32) 


For  the  first  term,  we  have 


Lemma  5 . 33 : 


~  J-  ^  ~  1  fk) 

E  T  -  1  S-l'  -  [T  -  1  V 


5M  +  2M 
n  -  1 


Proof:  We  use  Corollary  5-20: 


(5.3U)  <  E'k(§-^)  ♦  - 


5ME'k(— 


1  1 


T  -  1  m 


where  f£  and  f^_1  are  evaluated  at  the  corresponding  C(hD)  (note 
that  m  here  is  a  random  variable,  (m  -  l)!  <  D  <  m! ) . 

By  definitions  (5.28)  and  (5.29),  the  first  term  is  precisely 


The  second  term  is  separated  into  two  parts.  If  D  <  (n  -  1 ) ! ,  then 
m  <  n  -  1,  hence 


(m  -  1)1  (n  -  2)1  _  1 

T  -  1  "  (n  -  1)1  n  -  1  ’ 


giving  a  bound  of  2M/(n  -  1).  Next,  we  claim  that 
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where  ^  denotes  the  indicator  function  of  the  event  {...}. 

Let  u  =  (n  -  1)!,  then  D  >  u  if  and  only  if  p(hu+1)  >  0  (see  the 

definition  (5.27)  of  D).  But  player  2  does  not  deviate  from  t, 

therefore  we  have  P'k-a.s.:  P(h  _ )  >  0  if  and  only  if  P(h  ,i  )  >  0. 

u+1  a  u  u 

Conditioning  on  and  i^  gives 


X{P(hu,iu)>0} 


By  (5.29),  f^_-^  "  ^  and  =  f^+^;  recalling  (5.31)  shows  that  the 
whole  expression  is  zero. 

The  last  term  in  (5.3*0  is  also  separated  into  two:  for 
D  <  (n  -  2)1 , 


for 


D  -  1  .  1  (n  -  2)1 
T  -  1  m  -  (n  -  l)l 


D  >  (n-2)!,m>n-l  and 


D  -  1 
T  -  1 


This  gives  a  bound  of  5M/(n  -  l);  together  with  2M/(n  -  l) 

from  the  second  term,  the  proof  is  completed.  Q.E.D. 


For  the  last  term  in  (5-32),  we  condition  on 


Lemma  5 . 35 : 


l  Ak(i  J  )|H  )  <  ~-  --D  f*  + 
l  -  X  t=D+1  t  t  D  .  -  T-l  T 


2M 


/r  ~ 
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Proof:  Given  h^,  player  2  uses  the  punishment  strategy  f(y) 
starting  at  t  =  D  +  1,  with  y  =  fj?^))  =  f*T  (see  (5.29)).  The 
inequality  is  obtained  from  Proposition  5.17,  applied  to 


T 


T-l 


1 


l 

t=D+l 


A  ^t’^t^ 


Q.E.D. 


Finally,  we  have  our  result. 


Proposition  5-36:  For  every  e  >  0  there  exists  T  =  T  (e) 

o  o 

such  that  for  all  T  >  T  and  all  o' 

o 


„.k/  k,  k  , 

E'  (a^,)  <  a  +  e 


for  all  k  in  K. 


Proof :  Combining  the  inequalities  in  Lemmata  5.33  and  5.35, 
we  obtain  from  (5-32) 


E 


’k(<i)  <  E 


.k(T 


-  2 


T-l 


2) 

T 


5M  +  2M 
n  -  1 


2M 

/T  -  1 


The  first  term  differs  from  E'k(f£)  =  =  ak  (see  Proposition  5.30 

and  (5«l))  by  at  most  another  M/(T  -  1).  All  additional  terms  are 
independent  of  o',  and  converge  to  zero  as  T  -*■  »  (hence,  n  «  too).  Q.E.D. 


5.6  t  is  Optimal  Against  a 

Here  we  prove  that  t  is  a  best  response  of  player  2  against  player  1 
using  o;  as  in  subsection  5*5,  we  obtain  the  uniform  pr^erty  (2.12): 


Given  e  >  0,  we  show  that  there  is  T  =  T  (e)  such  that  for  all 

o  o 

T  >  T  and  all  t  * 
o 


Vt'.p'M  i s  * 1 


(recall  Proposition  5.26).  We  will  use  the  notations  E'  and  P'  for 
a,t '  ,p. 

In  subsection  5-5  the  time  of  the  first  detectable  deviation  was 

defined  (see  (5.27));  also,  the  G-process  was  translated  to  the  space  of 

histories  by  (5.28)  and  (5.29).  Thus,  6^(11^)  is  the  value  of  the  sequence 

{6  (z  )}  just  before  the  deviation  (if  any,  up  to  stage  t).  We  have 
mmm 

A  00 

Proposition  5-37=  The  sequence  is  a  martingale  on 

(ft,H  ,P' )  with  respect  to  {H  }”  . . 

oo  u  X=± 

Proof:  Similar  to  that  of  Proposition  5.30.  Again,  we  actually 
prove  a  stronger  assertion  (but  which  will  not  be  needed  in  the  sequel), 
namely 


-  K 

P'-a.s.,  for  all  t  in  N,  h  in  H  and  j  in  J.  The  only  case 

Xu  X 

to  check  is  t  =  m!  and  D  >  t  +  1.  The  probabilities  of  and 

Zm+1  are  each  no  matter  what  player  2  chooses  at  stage  t;  this  is 
so  by  definition  of  o  in  case  (i)  (signalling),  and  by  Lemma  5.1^  in 
case  (ii)  (a  jointly  controlled  lottery;  since  D  >  t  +  1,  j  =  j ' 


or 
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An  important  property  of  a  is  that  player  2  cannot  increase 
the  probability  of  reaching  any  node  in  the  tree  (he  may  be  able 

to  decrease  it  by  making  detectable  deviations  in  previous  stages). 

Proposition  5-38:  Let  t  £  N,  (n  -  l)!  <  t  <  n! ,  and  let  zq 

be  an  atom  of  Z  .  Then 
n 

P’(?(ht)  =  zn)  <  Q(zn)  . 

Proof :  Induction  on  n.  For  n  *  1  we  clearly  have  equality. 

Let  u  =  (n  -  l)!,  then  definition  (5-28)  gives 

P'U(h  )  =  z  )  =  P'(c(h  )  =  z  and  D  >  u  +  1) 
z  n  u+i  n  — 

=  P’(?(hu+1)  =  znU(hu)  =  Zr_1  and  D  >  u  +  1) 

•  P'(D  >  u  +  l|c(h  )  =  zn  )  •  P'(?(h  )  -  a  )  . 

u  n-l  u  n-1 

The  same  argument  as  in  the  proof  of  Proposition  5.37  shows  that  the 
first  factor  is  1/2  (since  D  >  u  +  1,  in  both  cases  (i)  and  (ii)  the 
probabilities  do  not  change);  the  second  is  at  most  1,  and  the  third  at 
most  ^  induction.  This  completes  the  proof,  since 

QUn)  =  (1/2)  •  Q(zn_i>  by  (5.3).  Q.E.D. 

Let  t'  be  a  fixed  strategy  of  player  2,  and  fix  T  in  N, 
(n-l)!<T<n!.  As  in  subsection  5-5,  we  divide  E’  ( )  into  three 
parts,  as  follows  (D  stands  for  DAT): 


(indeed: 


Clearly  n  is  Z  -measurable,  and  n  -*■  &  Q-a.s.  as  n  -*■  » 
n  n  n  “ 

Pn  -*■  9n  ">•  0^  by  (5-11),  and  6^  =  p^  •  Bte^)  by  (5  ")).  We  also 

have  6  -*■  6  as  n  <*>,  therefore 

n  » 


lim  E 

D« 


(z) 


0 


(everything  is  bounded  by  M).  Thus,  for  every  e  >  0  there  is 
n.^  =  n  (e)  large  enough  such  that 

E(Z)(I\  -  !J>  *  e 
for  all  n  >  n^. 

Using  Lemma  5.^1  and  then  Corollary  5-21  (with  q  =  p  ) ,  we 

m 

obtain  by  (5.^0) 


e,<FTTbt-i)  1 


E’  ( 


D  - 


T  - 


lnm} 


+  i+ME'  ( 


D  -  1 


T  - 


ib 


where  (m  -  l)!  <  D  <  m! ,  and  n  =  nm(?(hp)).  As  in  the  proof  of 
Lemma  5-33,  the  last  term  is  no  more  than  i*M/(n  -  l),  which  can 
be  made  arbitrarily  small  for  large  enough  T  (independent  of  t'). 
Therefore,  it  remains  to  bound 


(recall  that  <5^  =  6m(c(h^))).  We  separate  into  three  parts:  m<n- 
m  =  n  -  1,  and  m  =  n.  The  first  one  is  bounded  by  2M/(n  -  1)  (since 
D  <  (n  -  2)!).  Let  z_  be  an  atom  of  Z_,  then 
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P* (c(bD)  =  zn)  =  P'(?(hT)  =  zn)  <  Q(zn) 


by  (5.28)  and  Proposition  5-38-  This  implies  that 


E,(rr~lK  '  SJ  ‘  xtm=n}  >  i  E(Z)^nn  -  ‘n11  ' 


Similarly,  let  z  ,  be  an  atom  of  Z  »  then 
n-1  n-i 


P'(^(hD)  =  zn_1)  =  p'(c(hn!)  =  zn_x  and  D  <  n!)  <  QU^) 


and 


e'(FTtI'1„  -  5J  •  X{»=n-1))  i  E(Z)(IVl  -  'n-l11 


If  n  >  n1(e),  both  expectations  are  bounded  by  e,  which  completes  the 
proof.  Q.E.D. 

For  the  last  term  in  (5.39) >  we  again  condition  on  H^. 

Lemma  5.^3: 


E’(fTT  T  B<(i  ,j  )|H  ) 

1  1  t»D+l  t  *  u 


Proof:  By  Lemma  5. 1+1,  the  posteriors  at  hp  are  given  by 

p  =  p  (c(h_)).  From  stage  D  +  1  on,  player  1  uses  his  punishment 
m  m  D 

strategy;  by  Proposition  5.18,  the  expression  we  consider  is  thus  no 


more  than 
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~T~  H"1  (V6X  VSl2B)(pm) 

(we  used  Proposition  3.16  ( ii ) ) .  But 
proof. 


<  1  -  D  1  , 

T  -  1  6m 


<5t  =  6m(c(hD)),  completing  the 


Q.E.D. 


Proposition  5.44:  For  every  e  >  0  there  exists  Tq  =  TQ(e) 

such  that  for  all  T  >  T  and  all  t ' 

o 


E'(Bt)  <  B  +  e  . 

Proof:  Similarly  to  Proposition  5.36,  we  combine  (5.39) » 

Lemmata  5-42  and  5*43,  Proposition  5.37»  and  (5*1).  Q.E.D. 

We  have  completed  the  proof  of  the  second  half  of  our  main  result. 

Proposition  5.45:  Let  (a,B,p)  G  G*.  Then  there  exists  a 
uniform  equilibrium  point  (a, t)  in  I*w(p)  with  payoffs  (a,B). 

Proof:  Propositions  5*26,  5.36  said  5.44.  Q.E.D. 


6.  Enforceable  Joint  Plans 

Let  us  consider  now  equilibria  that  require  finite  sequences  of 
communications.  For  every  positive  integer  m,  let 

Gm  =  (g  £  G*:  there  existts  a  G-process  ^Sn^n=1  starting 
at  g  such  that  gR  =  gffi  for  all  n  >  m) 
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Thus,  Gm  corresponds  to  those  G-processes  for  which  the  limit  g 

00 

is  reached  already  at  stage  m.  Clearly,  G^  =  G  (recall  (3.12)). 

2 

Therefore,  the  first  such  set  to  study  is  G  . 

The  following  is  easily  obtained:  A  point  g  =  (a,$,p)  belongs 

2 

to  G  if  and  only  if  g  can  be  expressed  as  a  convex  combination  of 
points  in  G,  all  of  which  have  the  same  a  coordinate  or  the  same  p 

coordinate.  Thus,  there  is  a  finite  set  S  such  that  g  =  £  p(s)g(s), 

S  sGS 

with  p  =  ( p ( s ) ) in  A  ,  g(s)  =  (a(s) ,0(s) ,p(s))  in  G  for  all  s 

in  S,  and  either  a(s)  =  a  for  all  s  or  p(s)  =  p  for  all  s. 

The  latter  case  (p(s)  =  p  for  all  s)  leads  to  no  additional 

points  outside  G;  this  is  due  to  the  fact  that,  for  a  fixed  p,  the 

set  of  (a, 0 )  such  that  (a,8,p)  belongs  to  G  is  a  convex  set 

(indeed,  all  conditions  (3.3),  (3.^),  (3.9)-*(3.1l)  are  invariant  under 

convex  combinations — again,  when  p  is  constant). 

Therefore,  the  only  interesting  case  is  a(s)  =  a  for  all  s 

2 

(and  p(s)  not  constant).  This  generates  points  in  G  that  do  not 
necessarily  belong  to  G,  and  that  correspond  to  equilibria  with  one 
communication  only^-^  (signalling),  followed  by  payoff  accumulation 
henceforth  (using  frequencies).  Following  Aumann,  Maschler  and  Stearns 
[1968],  this  is  called  an  enforceable  joint  plan.^^ 

An  interesting  question  is:  how  many  different  signals  are  needed? 
Since  the  only  information  player  1  has  (that  player  2  has  not)  is  the 
value  of  k,  it  seems  reasonable  that  no  more  than  | K J  signals  should 
be  required.  Namely,  the  most  player  1  can  transmit  to  player  2  is 


Just  k,  which  has  |k|  possible  values.  However,  it  turns  out  that 
this  is  not  the  case,  and  the  correct  bound  is  |k|  +  1  rather  than 
|k|  »  i.e.,  no  more  than  |k|  +1  signals  are  needed,  and  there  are 
examples  which  do  indeed  require  |k|  +1. 

For  every  integer  A,  let  G  (A)  be  the  set  of  all  g  =  (a,f$,p) 

2 

in  G  such  that 

SL  J l 

g  =  l  p(s)g(s)  ,  l  P ( s )  =  1  ,  g(s)  =  (a,B(s) ,p(s) )  E  G 

s=l  s=l 

and  p(s)  >  0  for  all  s  =  l,2,...,fc 
Proposition  6.1: 

G2  =  g2(|k|  +  1)  . 

Proof:  For  fixed  a,  the  vector  (B,p)  lies  in  R  *  A  ,  which 
is  a  | K| -dimensional  Euclidean  space;  we  now  apply  Caratheodory ' s 
Theorem.  Q.E.D. 

2  2 

We  will  next  present  an  example  where  G  ^  G  ( J  K [ ) ,  showing 
that  |k|  +  1  is  the  best  bound. 

Example  6.2:  Let  K  =  {1,2},  I  =  {1,2},  J  =  {1,2, 3, 4,5 ,6,7} . 

The  two  games  are  {player  1  chooses  the  row,  player  2  the  column): 


0,0 

0,4 

0,-5 

0,0 

0,4 

O 

** 

1 

vn 

1 

2 

3 

0,0 

0,-5 

0,4 

0,0 

0,-5 

0,4 

It  is  easy  to  see  that  (val^Anp)*  -1  for  all  p  in  A  ,  and 

(vex  val2B)(p)  =  (val2B)(p)  =  max  {-9P1  +  6p2,  -3P1  +  3p2,  3P1  -  3p2, 
12  12 

6p  -  9p  ) ,  where  p  =  (p  ,p  ).  Therefore,  the  intersection  of  G  with 
the  hyperplane  a  =  (0,0)  consists  of  exactly  three  points: 

g(l)  =  ((0,0),  0,  (|,|))  , 

g(2)  =  ((0,0),  1,  (|,|))  , 

g(3)  =  ((0,0),  1,  (if))  , 


((0,0),  1,  (|,i))  , 


((0,0),  1,  (|,|))  , 


12  12 

where  we  write  as  usual  g  =  ((a  ,a  ),B,(p  ,p  ));  these  three  points 
correspond  to  j  =  1,  j  =  2  and  A  =  3,  respectively  (i  does  not 
matter).  Indeed,  since  a  =  (0,0),  j  =  4,  5,  6  and  7  are  not  possible; 
individual  rationality  for  player  2  (nemely  (3.1*))  then  implies  that 
J  =  1  can  be  used  only  at  p  =  (1/2,  1/2),  .J  =  2 
and  j  =  3  only  at  p  =  (1/3,  2/3). 


only  at  p  =  (2/3,  1/3) , 
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p 

Therefore  G  will  contain  the  convex  hull  of  g(l),  g(2) 
and  g(3)  — however,  no  interior  point  of  this  triangle  can  be  expressed 
using  only  two  of  its  vertices. 

It  is  easily  seen  in  this  example  that  an  additional  condition 
may  reduce  the  number  of  signals  to  2  =  |k| .  In  general,  we  have 

2 

Proposition  6.3:  Let  (a,g,p)  €  G  .  Then  there  exists  g'  in 
R  such  that  g'  >  g  and  (a,6',p)  6  G^(|k[) 

Proof :  By  Proposition  6.3, 

£  £ 

g  =  (a,g,p)  =  l  p(s)g(s)  ,  l  p(s)  =  1  , 

s=l  s=l 

g(s)  =  (a,g(s) ,  p(s) )  €  G  and  p(s)  >  0  for  all  s  =  1,2, . . . ,£ ,  where 

£  <  |K|  +l.  Assume  £  =  |k|  +1,  and  consider  the  £  vectors 
£  K 

{(p(s),l)}  .  in  A  x  R.  They  must  be  linearly  dependent;  let 

s=x  o 

%  r 

(ir(s)}  _  be  not  all  zero  and  such  that  £  ir(s)p(s)  =  0  and 

S  _L  _ 

£  3-1  £ 

\  rr(s)  =  0.  Without  loss  of  generality,  we  assume  that  £  ir(s)@(s)  >  0 
s=l  s=l 

(otherwise,  replace  all  ir(s)  by  — tt ( s ) ) .  Let  n  =  min  {  — p (s )/ir (s ) : 

ir(s)  >  0>,  and  put  p'(s)  =  p(s)  +  pv(s).  Then  p'(s)  >  0  for  all 

s  =  1,2, ... ,£  and  at  least  one  p'(s)  is  zero;  moreover, 

£  £  l 

\  p'(s)  =1,  \  p'(s)p(s)  =  p  and  g'  =  \  p'(s)g(s)  >  g.  Q.E.D. 

s=l  s=l  s=l 


The  following  is  now  immediate. 
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Corollary  6.5:  Let  (a, 8)  be  the  payoffs  of  a  Pareto  optimal— ^ 
enforceable  joint  plan  equilibrium  in  ^(p).  Then  no  more  than  |k| 
signals  are  needed;  namely,  (a,8,p)  €  G  (|k|). 

What  about  Gm  for  larger  values  of  m?  It  is  easy  to  see  that 

in  ml 

each  G  is  obtained  from  the  previous  G  by  taking  convex  combi¬ 
nations — with  either  a  fixed  (when  m  is  even)  or  p  fixed  (when  m 
is  odd).  New  points  are  usually  obtained;  Aumann,  Maschler  and  Stearns 

op  h  O 

[1968]  provide  examples  where  G0  ^  G  and  G  ^  G  .  It  is  probably 
not  difficult  to  use  the  same  ideas  in  order  to  generate  examples  where 
Gm  ^  Gm_1  for  arbitrary  m. 

An  open  question  still  remains.  Is  G*  the  union  of  all  the 
Gm?  If  one  ignores  the  game  structure,  and  just  considers  the  notions 
of  bi-convexification  (Gm)  and  bi-martingale  (G*),  the  answer  is 
negative:  G*  may  contain  points  that  do  not  belong  to  any  Gm  (and 
it  is  not  Just  a  matter  of  closure  either —  G*  may  be  a  very  different 
set).  For  details  on  these  problems,  the  reader  is  referred  to  the 
forthcoming  paper  of  R.  J.  Aumann  and  the  author. 


Footnotes 


The  standard  example  is  the  well  known  children's  way  of  choosing 
among  two  alternatives  with  equal  probability  ( "two- finger 
Morra" ) :  they  each  show,  simultaneously,  either  one  or  two 
fingers.  If  they  match  (i.e.,  both  show  the  sane  number),  the 
first  alternative  is  chosen;  if  not,  the  second  one.  If  both 
choose  the  number  of  fingers  at  random  (i.e.,  with  probabilities 
1/2,  1/2),  the  two  alternatives  each  have  probability  1/2,  even 
when  one  of  the  participants  uses  any  other  strategy!  (This  is 
better  than  tossing  a  coin,  which  may  be  counterfeit  -  a  fact 
known  to  one  but  not  to  the  other).  This  idea  of  jointly 
controlled  randomizations  is  due  to  Aumann,  Maschler  and  Stearns 
11968] . 

In  this  case,  one  may  duplicat  the  single  strategy  of  a  player: 
this  enables  him  to  make  choices,  which  do  not  affect  the  payoffs 
but  serve  as  "signals". 

The  set  Hj_,  being  an  empty  product,  is  defined  to  consist  of  one 
element  only. 

A  finite  field  means  a  field  with  finitely  many  elements;  such  a 
field  is  equivalent  to  a  finite  partition  of  the  space  (the  atoms 
of  the  field  being  the  elements  of  the  partition). 

The  statement  is  to  be  understood  as:  [a^...]  if  and  only  if 
(an«..  for  all  n] ;  similarly  in  (ii). 

I.e.,  with  6  or  6  for  0,  and  p  or  p  for  p  in  (3*1*) 
®  8  *  8 

Henceforth  we  will  always  use  t  for  integers  in  N,  and  s  for 
half-integers  in  Ng. 

We  list  here  only  those  we  will  need  in  our  proofs;  the  existence 
of  such  L  is  guaranteed  by  the  Hahn- Banach  Theorem  (see  the 
reference  above ) . 

Z  is  a  set,  Z  a  o-field  on  Z,  and  Q  a  probability  measure 
on  . 

By  "period"  we  will  usually  mean  the  stages  from  (n-l)!  to  n! 
for  some  n. 


If  there  are  more  than  two  strategies,  the  communications  may  be 
"shortened"  (i.e.,  less  stages)*  This  is  not  important  in  our 
model,  since  payoffs  in  finitely  many  periods  do  not  matter,  but 
will  be  so  if  a  fixed  discount  rate  is  assumed. 

This  is  the  set  of  infinite  histories  which  coincide  with  hj.  up 
to  time  t,  and  for  which  DAT  is  no  less  than  t. 

If  the  G-process  is  standard  (cf.  subsection  5.1),  this  would 
require  one  stage  in  the  game;  in  general,  this  may  take  longer 
(e.g.,  if  player  1  uses  only  i'  and  i"  as  in  subsection  5.3, 
then  at  least  loggfc  stages  are  needed,  where  l  is  the  number  of 
different  values  of  gg) • 

They  only  define  "Joint  plans"  -  and  then  find  conditions  under 
which  these  can  be  "enforced"  try  equilibria.  As  Sorin  I1981J 
pointed  out,  one  of  their  conditions  should  be  slightly 
strengthened  -  and  then  it  corresponds  to  our  characterization 
of  G2. 

I.e.,  such  that  there  is  no  other  enforceable  Joint  plan 
equilibrium  in  1*  (p)  with  payoffs  (a', S')  satisfying 
(a*,^)  (a, 6)  and  (a',0')  *  (a,$). 
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